AI Assistant Chat Interface

A project recipe — really a family of recipes — for an LLM-backed chat surface, organized along two axes: deployment shape (standalone vs embedded) and implementation path (custom vs prebuilt AI SDK). The target is an LLM-backed chat experience — either a standalone ChatGPT-style app or a third-party-mountable widget — built on the same React/Vite/Cloud Run foundations as the React SPA and React SPA with Firebase recipes. This recipe layers onto those — it does not replace them.

When to Use This Recipe

The two axes are independent. Pick a deployment shape based on where the chat lives and who owns the surrounding page; pick an implementation path based on how much UX control you need and how soon you need to ship.

The deployment shape decides hosting, identity, and isolation. The implementation path decides how much chat plumbing you write yourself.

Concern	Standalone app	Embedded widget
Primary use case	Full product surface — chat is the app	In-product copilot, support widget, sales assistant
Host integration	Own pages, own routes	`<script>` tag plus a JS API on someone else's page
Identity model	Own auth — Firebase Auth or Identity Platform	JWT handoff from the host page
Distribution	Cloud Run + nginx or Firebase Hosting	CDN-served ESM + UMD bundle
Isolation needs	None — you own the document	Shadow DOM or iframe to insulate from the host's CSS
Baseline complexity	Lower	Higher — mounting, theming, host CSP, version pinning

Concern	Prebuilt SDK (Vercel AI SDK + AI Elements)	Custom components
Streaming, markdown, tool-call rendering	Provided	Build it yourself
Theming control	shadcn-style source you own and edit	Total — every pixel
Bundle size	Medium — pulls Tailwind plus Radix primitives	Lean if you keep it lean
Time-to-MVP	Fast — days	Slow — weeks
Fit for Shadow-DOM widgets	Poor — SDKs are not Shadow-DOM-tested	Good — you control every portal target
Vendor coupling	Vercel / AI SDK conventions	None

tip

Default to Vercel AI SDK + AI Elements. The components ship as shadcn-style source — you own the files, edit them in place, and they sit on the same Tailwind base as the rest of the recommended stack. The SDK is provider-agnostic (OpenAI, Anthropic, Google, local). Deviate only for extreme bundle constraints, bespoke UX that fights AI Elements' Radix primitives, or Shadow-DOM-isolated widgets.

caution

Never call provider APIs from the browser. Provider keys (OpenAI, Anthropic, Google) are bearer tokens with full account access — anyone who reads your bundle owns your billing. Always proxy through a backend you control. See Security overview.

Choose standalone when: the chat is the primary product surface, the app shell is yours, conversation history is part of the UX, identity is single-tenant.
Choose embedded when: the chat is a support, sales, or in-product copilot inside a host app — especially a third-party host you do not own — and SSO is inherited from the host.
Choose a prebuilt SDK when: time-to-market matters, the UX is standard chat, the host stack is already Tailwind plus shadcn/ui.
Choose custom when: the widget needs Shadow-DOM isolation, the bundle target is sub-100 KB, the UX is bespoke, or the design system conflicts with AI Elements' Tailwind + Radix base.

Project Overview

This recipe targets an LLM-backed assistant — Q&A bot, in-app copilot, customer-support agent — layered on top of either the REST recipe or the Firebase recipe. It is provider-agnostic; the cookbook deliberately does not pick a model.

Decision	Choice
Rendering	Client-side SPA (or embeddable widget bundle)
Hosting (standalone)	Cloud Run + nginx or Firebase Hosting
Distribution (embedded)	Vite library mode → ESM + UMD, served from a CDN
Language	TypeScript
UI framework	React
Backend transport	Cloud Run SSE proxy or Firestore-doc-as-stream via Eventarc
AI orchestration	Vercel AI SDK (default) / Genkit / assistant-ui / CopilotKit / custom
Streaming	SSE over `fetch` + `ReadableStream`
Auth (standalone)	Firebase Authentication / Identity Platform
Auth (embedded)	JWT handoff from host via `postMessage` / `identify({ userId, jwt })`
Persistence	Firestore — `conversations/{id}/messages/{messageId}`

Tech Stack

Core

Same as React SPA on Cloud Run: React, TypeScript, Vite.

Styling & UI

Same Tailwind + shadcn/ui base as the parent recipes. AI Elements ships as shadcn-style source components, so it drops into the same base without introducing a separate component library or theme system.

State & Data

Library	Role
TanStack Query	Server state (REST variant) — conversation list, message history
Firebase JS SDK (`firebase/firestore`, `firebase/auth`)	Real-time listeners and auth (Firestore variant) — see React SPA with Firebase
Zustand	Client-only chat UI state — composer draft, scroll anchor, sidebar open/closed

The chosen transport decides which row drives server state. The REST variant uses TanStack Query against a Cloud Run endpoint; the Firestore-doc-as-stream variant uses onSnapshot listeners. Zustand sits on top in either case for composer draft, autoscroll anchor, and launcher panel state.

note

TanStack Virtual is under evaluation for chat-specific scroll management. Its anchorTo: 'end', followOnAppend, and isAtEnd() primitives can replace the scroll anchor flags currently held in Zustand, leaving Zustand to own only composer draft and sidebar state. See the TanStack Virtual chat primitives blog post for the integration approach.

AI

Library	Role
Vercel AI SDK (`ai`, `@ai-sdk/react`)	Provider-agnostic streaming, `useChat`, UI Message Streams over SSE
AI Elements	shadcn-style chat UI components — Conversation, Message, PromptInput, Response, Tool, Sources, Branch
assistant-ui	Alternative — headless Thread / Composer / BranchPicker; pairs well with LangGraph
CopilotKit	Alternative — host-state-aware copilots (`useCopilotAction`, `useCopilotReadable`)
TanStack AI	Alternative — framework-agnostic, provider-agnostic SDK with per-model type safety; built on AG-UI protocol
Firebase Genkit	Server-side orchestration on Cloud Run — flows, tools, traces
`react-markdown` + `remark-gfm` + `rehype-sanitize` + `streamdown`	Streaming-aware markdown rendering on the custom path
`shiki` or `highlight.js`	Code-block syntax highlighting

tip

Pick by fit. Vercel AI SDK + AI Elements is the default — provider-agnostic, source-distributed, Tailwind-native. Reach for assistant-ui when the UX needs first-class multi-thread state or branch pickers. Reach for CopilotKit when the chat must read or mutate host-app state through declared actions and readables. Reach for TanStack AI when provider portability and per-model type safety matter more than Vercel-ecosystem integration. Microsoft Fluent UI Copilot only on Microsoft-aligned projects. Drop to fully custom only when Shadow-DOM isolation or sub-100 KB bundle targets force the issue. See AI Tooling Triage for full evaluations.

Routing & Forms

Same as React SPA on Cloud Run: React Router, React Hook Form, Zod, date-fns, Sonner.

Build & Testing

Library	Role
Vitest	Unit and integration testing
Playwright	End-to-end testing
Vite library mode	Embedded-widget build target — produces ESM and UMD

tip

All core choices align with the Recommended Tech Stack. See individual pages for rationale and alternatives.

Architecture Overview

There are three architectures depending on which axis you pick. All three reuse infrastructure already covered by the parent recipes — only the chat-specific layer is new.

Standalone over REST + SSE

The client opens a fetch POST to /api/chat with the auth bearer attached. The backend forwards the prompt to the LLM provider and pipes the SSE stream back to the browser unchanged. The SPA renders tokens as they arrive and persists both the user and assistant messages to Firestore for history. This extends React SPA on Cloud Run — same SPA hosting, same auth, with one new streaming endpoint.

Standalone over Firestore-doc-as-stream

Firestore commit latency is roughly 100–300 ms per write, so chunks are batched at sentence or paragraph boundaries rather than per token. The trade-off: not true token-level streaming, but no SSE plumbing, real-time fan-out to multiple devices for free, and offline-ready by default. This extends React SPA with Firebase — same Eventarc wiring, just a longer-lived handler that streams to Firestore instead of writing once.

There are three structural options. A cross-origin iframe is the default for third-party hosts — total isolation, the host only needs frame-src in its CSP. A Web Component plus Shadow DOM keeps CSS and DOM isolated while sharing the JS realm — bundle weight is roughly 150–400 KB gzipped, or 50–120 KB with Preact-in-Shadow. An inline mount is first-party-only and offers no isolation. The chat backend is identical to the standalone variants; only mounting and identity differ.

Key Patterns

AI SDK Chat Loop. On the client, useChat from @ai-sdk/react is configured with new DefaultChatTransport({ api: '/api/chat' }); on the server, the route returns result.toUIMessageStreamResponse(). Typed event parts — text-delta, tool-call, tool-result, reasoning, source, finish — replace hand-rolled SSE parsing on the client. The custom path uses fetch plus a ReadableStream reader and a small SSE event parser written in-house — that is exactly what the SDK does internally.

SSE Streaming with Authenticated Requests. The native EventSource API cannot set custom headers, which rules it out for any chat endpoint that requires a bearer token. Production chat clients use fetch POST with a ReadableStream body reader and parse the SSE framing (data: lines, \n\n event boundaries) themselves. On Cloud Run set --no-cpu-throttling so streaming responses are not paused between chunks. Real-Time Features carry a 2× multiplier in the Complexity Factors.

Conversation Persistence Schema. A conversations/{conversationId} document holds title, owner UID, created and updated timestamps. A conversations/{conversationId}/messages/{messageId} subcollection holds each turn with role, content parts, status, tool calls, sources, and token counters. The same schema works for both transports — only the writer differs. The REST variant has the client write user messages and the backend write assistant messages; the Eventarc variant has Cloud Run own all assistant writes. Server state at this scale carries a 2–3× multiplier per the Complexity Factors.

Tool-Call UI. AI Elements ships a Tool component that renders the call-arguments-result lifecycle for any tool. For custom UIs, model each tool call as a state machine — pending, running, complete, errored — and render a bespoke component per tool name. Each new tool surface adds 3–8 dev-days per the Complexity Factors.

Cost Guardrails & Rate Limiting. Per-user, per-day token budgets must be enforced server-side. A common pattern is an atomic Firestore counter at usage/{uid}/{yyyymmdd} incremented inside the chat handler, or a Redis counter for higher throughput. Without this, a single malicious or buggy client can run thousands of dollars of provider bills before anyone notices — an OWASP LLM Top-10 risk.

Prompt Injection & Content Moderation. Treat all retrieved or uploaded content — RAG passages, file uploads, chat history fetched from a different user — as data, never as instructions. The system prompt is the only privileged channel. Run inbound user input and outbound assistant output through a moderation API (OpenAI Moderation or Google Safe Content Classifier) before persisting or rendering.

caution

Prompt injection is OWASP LLM Top-10 risk #1. See OWASP Top 10 for LLM Applications and Security overview.

Accessibility for Streaming. A naive aria-live="polite" region announcing every token re-reads the partial sentence on every update — unusable with a screen reader. Buffer the streamed text and push to the live region only at sentence boundaries. Screen-reader-friendly streaming adds +10–20% ("Screen reader optimization") per the Complexity Factors.

Embedded Widget Mounting & Theming. React mounts inside a shadow root via createRoot(shadowRootDiv); Tailwind needs shadow-root-aware stylesheet injection — react-shadow or react-shadow-scope are the common helpers. Radix UI portals — used by shadcn/ui Dialog, Popover, Tooltip — escape the shadow root by default; override the container prop so the portal mounts inside the shadow root, otherwise floating UI will leak into the host page's CSS scope. Most prebuilt SDKs are not Shadow-DOM-tested. For strict isolation prefer iframe or a fully custom widget; swap React for Preact (preact/compat) to recover roughly 40 KB.

caution

For third-party-host embeds, default to iframe. Cross-origin isolation is guaranteed without per-CSS-quirk debugging.

Authentication Patterns. Standalone is the same as the parent recipes — Firebase Auth or Identity Platform. Embedded with host SSO: the host page calls window.AlizChat.identify({ userId, jwt }) after its user signs in; the widget attaches that JWT to chat requests; the backend verifies it against the host's JWKS. Embedded anonymous-then-upgrade: the widget mints an anonymous Firebase Auth session for guest visitors, and on host login the host hands a custom token via postMessage and the widget calls signInWithCustomToken. Enterprise OIDC token-exchange counts as "SSO / SAML / OIDC", which carries a 2–3× multiplier per the Complexity Factors.

Security boundary

Provider keys live only on the backend — never in the SPA bundle, never in a widget bundle. The chat endpoint is the security perimeter: auth verification, rate limits, moderation, and prompt-injection mitigation all happen there. Frontend guards improve UX but never substitute for server-side enforcement. See Security overview.

Task Breakdown

The four quadrants share most of their work. The table below lists each epic once, with separate columns for the two implementation paths and an additive delta column for the embedded shape.

Epic	Standalone + Custom (dev-days)	Standalone + AI SDK (dev-days)	Embedded delta (dev-days)
Project Setup & Tooling	2–3	2–3	+1–2
Authentication	3–5	3–5	+2–4
App Shell / Launcher	2–4	2–4	+2–4
Conversation Persistence	3–5	3–5	–
Backend Chat Endpoint	3–5	3–5	–
Streaming Plumbing	3–5	1–2	–
Chat UI Primitives (message list, composer, autoscroll, markdown, code blocks)	6–10	1–2	–
Tool-Call UI	3–5	1–2	+0–2 (per host-state action)
Stop / Regenerate / Edit-Resend / Branching	3–5	1–3	–
Attachments & Multimodal Input	3–6	2–4	–
Cost Guardrails & Rate Limiting	2–4	2–4	–
Moderation & Prompt-Injection Hardening	2–4	2–4	–
Accessibility (streaming live region, keyboard)	2–3	2–3	–
Conversation Sidebar & History UX	3–5	2–4	n/a (replaced by launcher panel)
Widget Build & Mounting	–	–	+4–8 (Vite library mode, Shadow DOM or iframe, embed snippet, host CSP)
Testing	3–6	3–5	+1–2
Deployment & CI/CD	2–4	2–4	+1–2 (CDN + version pinning)

Ballpark Totals

Combo	Total effort	Duration (1 developer)	Duration (2 developers)
Standalone + Prebuilt SDK	30–55 dev-days	6–11 weeks	4–6 weeks
Standalone + Custom	45–80 dev-days	9–16 weeks	5–9 weeks
Embedded + Prebuilt SDK (iframe)	35–65 dev-days	7–13 weeks	4–7 weeks
Embedded + Custom (Shadow DOM)	55–95 dev-days	11–19 weeks	6–11 weeks

Ranges baseline against React SPA on Cloud Run's 25–45 dev-day estimate; the chat layer adds the new epics above. Apply the combining-factors rule — ×1.2 for two significant complexity multipliers stacked, ×1.3–1.5 for three or more.

caution

These are baseline estimates for a chat experience with standard scope. Apply complexity multipliers for voice input, RAG, multi-language UI, strict accessibility, branching threads, or many tool surfaces. Always present estimates as ranges.

What's Not Included

Model selection, prompt design, and prompt-engineering work — see Prompt Engineering
RAG indexing and retrieval pipeline (vector database, embeddings, chunking, ingestion jobs) — separate recipe scope
Fine-tuning, LoRA adapters, and custom-model hosting
Voice input/output (Web Speech API, Whisper, TTS) — 1.5–3× per the Complexity Factors, scope as a separate epic
Evaluation harnesses, prompt A/B testing, golden-set regression suites
Multi-agent orchestration — see Multi-Agent
UX/UI design and Figma work
Internationalization — add +10–20% if needed
WCAG accessibility audit — add +15–30%
Project management overhead (meetings, demos) — typically +20–30% (see Common Pitfalls)

Deployment Overview

Deployment depends on the deployment shape. Both shapes share the chat backend.

Standalone

SPA hosting follows the parent recipes — Firebase Hosting for the Firestore variant, or Cloud Run + nginx for the REST variant. Nothing new.

The chat backend is a separate Cloud Run service. Set --no-cpu-throttling so streaming responses are not paused between tokens, and raise the request timeout (default 5 minutes; up to 60 minutes on the second-generation execution environment) to cover long completions and tool loops. For the Firestore variant the streaming worker is the same Eventarc-triggered Cloud Run service described in React SPA with Firebase — same wiring, just a longer-lived handler that writes chunks rather than a single result.

Build with Vite library mode producing both ESM (modern bundlers) and UMD (<script> drop-in) artefacts. Serve from a versioned CDN path — Cloud Storage + Cloud CDN, or Firebase Hosting. Pin host pages to a specific version (widget.v1.2.3.js) and publish a latest alias for opt-in rolling updates.

The embed integration is a single <script> tag pointing at the UMD bundle plus a small init call exposing window.AlizChat with mount(), identify({ userId, jwt }), open(), and close(). The host page calls identify after its user signs in; the widget attaches the JWT to chat requests.

Host CSP requirements are bounded — script-src for the loader CDN, connect-src for the chat API origin, and frame-src if you ship the iframe variant. Provider API keys never leave the backend, so the host CSP only needs to allow your own domains.

tip

For third-party host embeds, default to the iframe variant. CSS isolation is guaranteed and the host only needs frame-src plus the loader CDN.

AI Assistant Chat Interface

When to Use This Recipe

Project Overview

Tech Stack

Core

Styling & UI

State & Data

AI

Routing & Forms

Build & Testing

Architecture Overview

Standalone over REST + SSE

Standalone over Firestore-doc-as-stream

Embedded widget mounting

Key Patterns

Task Breakdown

Ballpark Totals

What's Not Included

Deployment Overview

Standalone

Embedded widget

Further Reading

When to Use This Recipe​

Project Overview​

Tech Stack​

Core​

Styling & UI​

State & Data​

AI​

Routing & Forms​

Build & Testing​

Architecture Overview​

Standalone over REST + SSE​

Standalone over Firestore-doc-as-stream​

Embedded widget mounting​

Key Patterns​

Task Breakdown​

Ballpark Totals​

What's Not Included​

Deployment Overview​

Standalone​

Embedded widget​

Further Reading​

When to Use This Recipe

Project Overview

Tech Stack

Core

Styling & UI

State & Data

AI

Routing & Forms

Build & Testing

Architecture Overview

Standalone over REST + SSE

Standalone over Firestore-doc-as-stream

Embedded widget mounting

Key Patterns

Task Breakdown

Ballpark Totals

What's Not Included

Deployment Overview

Standalone

Embedded widget

Further Reading