AI Assistant Chat Interface
A project recipe — really a family of recipes — for an LLM-backed chat surface, organized along two axes: deployment shape (standalone vs embedded) and implementation path (custom vs prebuilt AI SDK). The target is an LLM-backed chat experience — either a standalone ChatGPT-style app or a third-party-mountable widget — built on the same React/Vite/Cloud Run foundations as the React SPA and React SPA with Firebase recipes. This recipe layers onto those — it does not replace them.
When to Use This Recipe
The two axes are independent. Pick a deployment shape based on where the chat lives and who owns the surrounding page; pick an implementation path based on how much UX control you need and how soon you need to ship.
The deployment shape decides hosting, identity, and isolation. The implementation path decides how much chat plumbing you write yourself.
| Concern | Standalone app | Embedded widget |
|---|---|---|
| Primary use case | Full product surface — chat is the app | In-product copilot, support widget, sales assistant |
| Host integration | Own pages, own routes | <script> tag plus a JS API on someone else's page |
| Identity model | Own auth — Firebase Auth or Identity Platform | JWT handoff from the host page |
| Distribution | Cloud Run + nginx or Firebase Hosting | CDN-served ESM + UMD bundle |
| Isolation needs | None — you own the document | Shadow DOM or iframe to insulate from the host's CSS |
| Baseline complexity | Lower | Higher — mounting, theming, host CSP, version pinning |
| Concern | Prebuilt SDK (Vercel AI SDK + AI Elements) | Custom components |
|---|---|---|
| Streaming, markdown, tool-call rendering | Provided | Build it yourself |
| Theming control | shadcn-style source you own and edit | Total — every pixel |
| Bundle size | Medium — pulls Tailwind plus Radix primitives | Lean if you keep it lean |
| Time-to-MVP | Fast — days | Slow — weeks |
| Fit for Shadow-DOM widgets | Poor — SDKs are not Shadow-DOM-tested | Good — you control every portal target |
| Vendor coupling | Vercel / AI SDK conventions | None |
Default to Vercel AI SDK + AI Elements. The components ship as shadcn-style source — you own the files, edit them in place, and they sit on the same Tailwind base as the rest of the recommended stack. The SDK is provider-agnostic (OpenAI, Anthropic, Google, local). Deviate only for extreme bundle constraints, bespoke UX that fights AI Elements' Radix primitives, or Shadow-DOM-isolated widgets.
Never call provider APIs from the browser. Provider keys (OpenAI, Anthropic, Google) are bearer tokens with full account access — anyone who reads your bundle owns your billing. Always proxy through a backend you control. See Security overview.
- Choose standalone when: the chat is the primary product surface, the app shell is yours, conversation history is part of the UX, identity is single-tenant.
- Choose embedded when: the chat is a support, sales, or in-product copilot inside a host app — especially a third-party host you do not own — and SSO is inherited from the host.
- Choose a prebuilt SDK when: time-to-market matters, the UX is standard chat, the host stack is already Tailwind plus shadcn/ui.
- Choose custom when: the widget needs Shadow-DOM isolation, the bundle target is sub-100 KB, the UX is bespoke, or the design system conflicts with AI Elements' Tailwind + Radix base.
Project Overview
This recipe targets an LLM-backed assistant — Q&A bot, in-app copilot, customer-support agent — layered on top of either the REST recipe or the Firebase recipe. It is provider-agnostic; the cookbook deliberately does not pick a model.
| Decision | Choice |
|---|---|
| Rendering | Client-side SPA (or embeddable widget bundle) |
| Hosting (standalone) | Cloud Run + nginx or Firebase Hosting |
| Distribution (embedded) | Vite library mode → ESM + UMD, served from a CDN |
| Language | TypeScript |
| UI framework | React |
| Backend transport | Cloud Run SSE proxy or Firestore-doc-as-stream via Eventarc |
| AI orchestration | Vercel AI SDK (default) / Genkit / assistant-ui / CopilotKit / custom |
| Streaming | SSE over fetch + ReadableStream |
| Auth (standalone) | Firebase Authentication / Identity Platform |
| Auth (embedded) | JWT handoff from host via postMessage / identify({ userId, jwt }) |
| Persistence | Firestore — conversations/{id}/messages/{messageId} |
Tech Stack
Core
Same as React SPA on Cloud Run: React, TypeScript, Vite.
Styling & UI
Same Tailwind + shadcn/ui base as the parent recipes. AI Elements ships as shadcn-style source components, so it drops into the same base without introducing a separate component library or theme system.
State & Data
| Library | Role |
|---|---|
| TanStack Query | Server state (REST variant) — conversation list, message history |
Firebase JS SDK (firebase/firestore, firebase/auth) | Real-time listeners and auth (Firestore variant) — see React SPA with Firebase |
| Zustand | Client-only chat UI state — composer draft, scroll anchor, sidebar open/closed |
The chosen transport decides which row drives server state. The REST variant uses TanStack Query against a Cloud Run endpoint; the Firestore-doc-as-stream variant uses onSnapshot listeners. Zustand sits on top in either case for composer draft, autoscroll anchor, and launcher panel state.
TanStack Virtual is under evaluation for chat-specific scroll management. Its anchorTo: 'end', followOnAppend, and isAtEnd() primitives can replace the scroll anchor flags currently held in Zustand, leaving Zustand to own only composer draft and sidebar state. See the TanStack Virtual chat primitives blog post for the integration approach.
AI
| Library | Role |
|---|---|
Vercel AI SDK (ai, @ai-sdk/react) | Provider-agnostic streaming, useChat, UI Message Streams over SSE |
| AI Elements | shadcn-style chat UI components — Conversation, Message, PromptInput, Response, Tool, Sources, Branch |
| assistant-ui | Alternative — headless Thread / Composer / BranchPicker; pairs well with LangGraph |
| CopilotKit | Alternative — host-state-aware copilots (useCopilotAction, useCopilotReadable) |
| TanStack AI | Alternative — framework-agnostic, provider-agnostic SDK with per-model type safety; built on AG-UI protocol |
| Firebase Genkit | Server-side orchestration on Cloud Run — flows, tools, traces |
react-markdown + remark-gfm + rehype-sanitize + streamdown | Streaming-aware markdown rendering on the custom path |
shiki or highlight.js | Code-block syntax highlighting |
Pick by fit. Vercel AI SDK + AI Elements is the default — provider-agnostic, source-distributed, Tailwind-native. Reach for assistant-ui when the UX needs first-class multi-thread state or branch pickers. Reach for CopilotKit when the chat must read or mutate host-app state through declared actions and readables. Reach for TanStack AI when provider portability and per-model type safety matter more than Vercel-ecosystem integration. Microsoft Fluent UI Copilot only on Microsoft-aligned projects. Drop to fully custom only when Shadow-DOM isolation or sub-100 KB bundle targets force the issue. See AI Tooling Triage for full evaluations.
Routing & Forms
Same as React SPA on Cloud Run: React Router, React Hook Form, Zod, date-fns, Sonner.
Build & Testing
| Library | Role |
|---|---|
| Vitest | Unit and integration testing |
| Playwright | End-to-end testing |
| Vite library mode | Embedded-widget build target — produces ESM and UMD |
All core choices align with the Recommended Tech Stack. See individual pages for rationale and alternatives.
Architecture Overview
There are three architectures depending on which axis you pick. All three reuse infrastructure already covered by the parent recipes — only the chat-specific layer is new.
Standalone over REST + SSE
The client opens a fetch POST to /api/chat with the auth bearer attached. The backend forwards the prompt to the LLM provider and pipes the SSE stream back to the browser unchanged. The SPA renders tokens as they arrive and persists both the user and assistant messages to Firestore for history. This extends React SPA on Cloud Run — same SPA hosting, same auth, with one new streaming endpoint.
Standalone over Firestore-doc-as-stream
Firestore commit latency is roughly 100–300 ms per write, so chunks are batched at sentence or paragraph boundaries rather than per token. The trade-off: not true token-level streaming, but no SSE plumbing, real-time fan-out to multiple devices for free, and offline-ready by default. This extends React SPA with Firebase — same Eventarc wiring, just a longer-lived handler that streams to Firestore instead of writing once.
Embedded widget mounting
There are three structural options. A cross-origin iframe is the default for third-party hosts — total isolation, the host only needs frame-src in its CSP. A Web Component plus Shadow DOM keeps CSS and DOM isolated while sharing the JS realm — bundle weight is roughly 150–400 KB gzipped, or 50–120 KB with Preact-in-Shadow. An inline mount is first-party-only and offers no isolation. The chat backend is identical to the standalone variants; only mounting and identity differ.
Key Patterns
AI SDK Chat Loop. On the client, useChat from @ai-sdk/react is configured with new DefaultChatTransport({ api: '/api/chat' }); on the server, the route returns result.toUIMessageStreamResponse(). Typed event parts — text-delta, tool-call, tool-result, reasoning, source, finish — replace hand-rolled SSE parsing on the client. The custom path uses fetch plus a ReadableStream reader and a small SSE event parser written in-house — that is exactly what the SDK does internally.
SSE Streaming with Authenticated Requests. The native EventSource API cannot set custom headers, which rules it out for any chat endpoint that requires a bearer token. Production chat clients use fetch POST with a ReadableStream body reader and parse the SSE framing (data: lines, \n\n event boundaries) themselves. On Cloud Run set --no-cpu-throttling so streaming responses are not paused between chunks. Real-Time Features carry a 2× multiplier in the Complexity Factors.
Conversation Persistence Schema. A conversations/{conversationId} document holds title, owner UID, created and updated timestamps. A conversations/{conversationId}/messages/{messageId} subcollection holds each turn with role, content parts, status, tool calls, sources, and token counters. The same schema works for both transports — only the writer differs. The REST variant has the client write user messages and the backend write assistant messages; the Eventarc variant has Cloud Run own all assistant writes. Server state at this scale carries a 2–3× multiplier per the Complexity Factors.
Tool-Call UI. AI Elements ships a Tool component that renders the call-arguments-result lifecycle for any tool. For custom UIs, model each tool call as a state machine — pending, running, complete, errored — and render a bespoke component per tool name. Each new tool surface adds 3–8 dev-days per the Complexity Factors.
Cost Guardrails & Rate Limiting. Per-user, per-day token budgets must be enforced server-side. A common pattern is an atomic Firestore counter at usage/{uid}/{yyyymmdd} incremented inside the chat handler, or a Redis counter for higher throughput. Without this, a single malicious or buggy client can run thousands of dollars of provider bills before anyone notices — an OWASP LLM Top-10 risk.
Prompt Injection & Content Moderation. Treat all retrieved or uploaded content — RAG passages, file uploads, chat history fetched from a different user — as data, never as instructions. The system prompt is the only privileged channel. Run inbound user input and outbound assistant output through a moderation API (OpenAI Moderation or Google Safe Content Classifier) before persisting or rendering.
Prompt injection is OWASP LLM Top-10 risk #1. See OWASP Top 10 for LLM Applications and Security overview.
Accessibility for Streaming. A naive aria-live="polite" region announcing every token re-reads the partial sentence on every update — unusable with a screen reader. Buffer the streamed text and push to the live region only at sentence boundaries. Screen-reader-friendly streaming adds +10–20% ("Screen reader optimization") per the Complexity Factors.
Embedded Widget Mounting & Theming. React mounts inside a shadow root via createRoot(shadowRootDiv); Tailwind needs shadow-root-aware stylesheet injection — react-shadow or react-shadow-scope are the common helpers. Radix UI portals — used by shadcn/ui Dialog, Popover, Tooltip — escape the shadow root by default; override the container prop so the portal mounts inside the shadow root, otherwise floating UI will leak into the host page's CSS scope. Most prebuilt SDKs are not Shadow-DOM-tested. For strict isolation prefer iframe or a fully custom widget; swap React for Preact (preact/compat) to recover roughly 40 KB.
For third-party-host embeds, default to iframe. Cross-origin isolation is guaranteed without per-CSS-quirk debugging.
Authentication Patterns. Standalone is the same as the parent recipes — Firebase Auth or Identity Platform. Embedded with host SSO: the host page calls window.AlizChat.identify({ userId, jwt }) after its user signs in; the widget attaches that JWT to chat requests; the backend verifies it against the host's JWKS. Embedded anonymous-then-upgrade: the widget mints an anonymous Firebase Auth session for guest visitors, and on host login the host hands a custom token via postMessage and the widget calls signInWithCustomToken. Enterprise OIDC token-exchange counts as "SSO / SAML / OIDC", which carries a 2–3× multiplier per the Complexity Factors.
Provider keys live only on the backend — never in the SPA bundle, never in a widget bundle. The chat endpoint is the security perimeter: auth verification, rate limits, moderation, and prompt-injection mitigation all happen there. Frontend guards improve UX but never substitute for server-side enforcement. See Security overview.
Task Breakdown
The four quadrants share most of their work. The table below lists each epic once, with separate columns for the two implementation paths and an additive delta column for the embedded shape.
| Epic | Standalone + Custom (dev-days) | Standalone + AI SDK (dev-days) | Embedded delta (dev-days) |
|---|---|---|---|
| Project Setup & Tooling | 2–3 | 2–3 | +1–2 |
| Authentication | 3–5 | 3–5 | +2–4 |
| App Shell / Launcher | 2–4 | 2–4 | +2–4 |
| Conversation Persistence | 3–5 | 3–5 | – |
| Backend Chat Endpoint | 3–5 | 3–5 | – |
| Streaming Plumbing | 3–5 | 1–2 | – |
| Chat UI Primitives (message list, composer, autoscroll, markdown, code blocks) | 6–10 | 1–2 | – |
| Tool-Call UI | 3–5 | 1–2 | +0–2 (per host-state action) |
| Stop / Regenerate / Edit-Resend / Branching | 3–5 | 1–3 | – |
| Attachments & Multimodal Input | 3–6 | 2–4 | – |
| Cost Guardrails & Rate Limiting | 2–4 | 2–4 | – |
| Moderation & Prompt-Injection Hardening | 2–4 | 2–4 | – |
| Accessibility (streaming live region, keyboard) | 2–3 | 2–3 | – |
| Conversation Sidebar & History UX | 3–5 | 2–4 | n/a (replaced by launcher panel) |
| Widget Build & Mounting | – | – | +4–8 (Vite library mode, Shadow DOM or iframe, embed snippet, host CSP) |
| Testing | 3–6 | 3–5 | +1–2 |
| Deployment & CI/CD | 2–4 | 2–4 | +1–2 (CDN + version pinning) |
Ballpark Totals
| Combo | Total effort | Duration (1 developer) | Duration (2 developers) |
|---|---|---|---|
| Standalone + Prebuilt SDK | 30–55 dev-days | 6–11 weeks | 4–6 weeks |
| Standalone + Custom | 45–80 dev-days | 9–16 weeks | 5–9 weeks |
| Embedded + Prebuilt SDK (iframe) | 35–65 dev-days | 7–13 weeks | 4–7 weeks |
| Embedded + Custom (Shadow DOM) | 55–95 dev-days | 11–19 weeks | 6–11 weeks |
Ranges baseline against React SPA on Cloud Run's 25–45 dev-day estimate; the chat layer adds the new epics above. Apply the combining-factors rule — ×1.2 for two significant complexity multipliers stacked, ×1.3–1.5 for three or more.
These are baseline estimates for a chat experience with standard scope. Apply complexity multipliers for voice input, RAG, multi-language UI, strict accessibility, branching threads, or many tool surfaces. Always present estimates as ranges.
What's Not Included
- Model selection, prompt design, and prompt-engineering work — see Prompt Engineering
- RAG indexing and retrieval pipeline (vector database, embeddings, chunking, ingestion jobs) — separate recipe scope
- Fine-tuning, LoRA adapters, and custom-model hosting
- Voice input/output (Web Speech API, Whisper, TTS) — 1.5–3× per the Complexity Factors, scope as a separate epic
- Evaluation harnesses, prompt A/B testing, golden-set regression suites
- Multi-agent orchestration — see Multi-Agent
- UX/UI design and Figma work
- Internationalization — add +10–20% if needed
- WCAG accessibility audit — add +15–30%
- Project management overhead (meetings, demos) — typically +20–30% (see Common Pitfalls)
Deployment Overview
Deployment depends on the deployment shape. Both shapes share the chat backend.
Standalone
SPA hosting follows the parent recipes — Firebase Hosting for the Firestore variant, or Cloud Run + nginx for the REST variant. Nothing new.
The chat backend is a separate Cloud Run service. Set --no-cpu-throttling so streaming responses are not paused between tokens, and raise the request timeout (default 5 minutes; up to 60 minutes on the second-generation execution environment) to cover long completions and tool loops. For the Firestore variant the streaming worker is the same Eventarc-triggered Cloud Run service described in React SPA with Firebase — same wiring, just a longer-lived handler that writes chunks rather than a single result.
Embedded widget
Build with Vite library mode producing both ESM (modern bundlers) and UMD (<script> drop-in) artefacts. Serve from a versioned CDN path — Cloud Storage + Cloud CDN, or Firebase Hosting. Pin host pages to a specific version (widget.v1.2.3.js) and publish a latest alias for opt-in rolling updates.
The embed integration is a single <script> tag pointing at the UMD bundle plus a small init call exposing window.AlizChat with mount(), identify({ userId, jwt }), open(), and close(). The host page calls identify after its user signs in; the widget attaches the JWT to chat requests.
Host CSP requirements are bounded — script-src for the loader CDN, connect-src for the chat API origin, and frame-src if you ship the iframe variant. Provider API keys never leave the backend, so the host CSP only needs to allow your own domains.
For third-party host embeds, default to the iframe variant. CSS isolation is guaranteed and the host only needs frame-src plus the loader CDN.
Further Reading
Internal docs:
- React SPA on Cloud Run — the REST-baseline recipe this cookbook layers onto
- React SPA with Firebase — the BaaS variant; Firestore-doc-as-stream backend
- Estimation — Complexity Factors — multipliers cited throughout this recipe
- Requirement Engineering — project requirements checklist
- Design Systems — choosing a design approach
- Rendering — rendering strategy comparison
- Deploy — other hosting options
- Testing Strategy — testing layers and tools
- Security overview — secrets handling, CSP, XSS
- AI — Prompt Engineering
- AI — Multi-Agent
- Recommended Tech Stack
External resources: