Skip to main content

AI Assistant Chat Interface

A project recipe — really a family of recipes — for an LLM-backed chat surface, organized along two axes: deployment shape (standalone vs embedded) and implementation path (custom vs prebuilt AI SDK). The target is an LLM-backed chat experience — either a standalone ChatGPT-style app or a third-party-mountable widget — built on the same React/Vite/Cloud Run foundations as the React SPA and React SPA with Firebase recipes. This recipe layers onto those — it does not replace them.

When to Use This Recipe

The two axes are independent. Pick a deployment shape based on where the chat lives and who owns the surrounding page; pick an implementation path based on how much UX control you need and how soon you need to ship.

The deployment shape decides hosting, identity, and isolation. The implementation path decides how much chat plumbing you write yourself.

ConcernStandalone appEmbedded widget
Primary use caseFull product surface — chat is the appIn-product copilot, support widget, sales assistant
Host integrationOwn pages, own routes<script> tag plus a JS API on someone else's page
Identity modelOwn auth — Firebase Auth or Identity PlatformJWT handoff from the host page
DistributionCloud Run + nginx or Firebase HostingCDN-served ESM + UMD bundle
Isolation needsNone — you own the documentShadow DOM or iframe to insulate from the host's CSS
Baseline complexityLowerHigher — mounting, theming, host CSP, version pinning
ConcernPrebuilt SDK (Vercel AI SDK + AI Elements)Custom components
Streaming, markdown, tool-call renderingProvidedBuild it yourself
Theming controlshadcn-style source you own and editTotal — every pixel
Bundle sizeMedium — pulls Tailwind plus Radix primitivesLean if you keep it lean
Time-to-MVPFast — daysSlow — weeks
Fit for Shadow-DOM widgetsPoor — SDKs are not Shadow-DOM-testedGood — you control every portal target
Vendor couplingVercel / AI SDK conventionsNone
tip

Default to Vercel AI SDK + AI Elements. The components ship as shadcn-style source — you own the files, edit them in place, and they sit on the same Tailwind base as the rest of the recommended stack. The SDK is provider-agnostic (OpenAI, Anthropic, Google, local). Deviate only for extreme bundle constraints, bespoke UX that fights AI Elements' Radix primitives, or Shadow-DOM-isolated widgets.

caution

Never call provider APIs from the browser. Provider keys (OpenAI, Anthropic, Google) are bearer tokens with full account access — anyone who reads your bundle owns your billing. Always proxy through a backend you control. See Security overview.

  • Choose standalone when: the chat is the primary product surface, the app shell is yours, conversation history is part of the UX, identity is single-tenant.
  • Choose embedded when: the chat is a support, sales, or in-product copilot inside a host app — especially a third-party host you do not own — and SSO is inherited from the host.
  • Choose a prebuilt SDK when: time-to-market matters, the UX is standard chat, the host stack is already Tailwind plus shadcn/ui.
  • Choose custom when: the widget needs Shadow-DOM isolation, the bundle target is sub-100 KB, the UX is bespoke, or the design system conflicts with AI Elements' Tailwind + Radix base.

Project Overview

This recipe targets an LLM-backed assistant — Q&A bot, in-app copilot, customer-support agent — layered on top of either the REST recipe or the Firebase recipe. It is provider-agnostic; the cookbook deliberately does not pick a model.

DecisionChoice
RenderingClient-side SPA (or embeddable widget bundle)
Hosting (standalone)Cloud Run + nginx or Firebase Hosting
Distribution (embedded)Vite library mode → ESM + UMD, served from a CDN
LanguageTypeScript
UI frameworkReact
Backend transportCloud Run SSE proxy or Firestore-doc-as-stream via Eventarc
AI orchestrationVercel AI SDK (default) / Genkit / assistant-ui / CopilotKit / custom
StreamingSSE over fetch + ReadableStream
Auth (standalone)Firebase Authentication / Identity Platform
Auth (embedded)JWT handoff from host via postMessage / identify({ userId, jwt })
PersistenceFirestore — conversations/{id}/messages/{messageId}

Tech Stack

Core

Same as React SPA on Cloud Run: React, TypeScript, Vite.

Styling & UI

Same Tailwind + shadcn/ui base as the parent recipes. AI Elements ships as shadcn-style source components, so it drops into the same base without introducing a separate component library or theme system.

State & Data

LibraryRole
TanStack QueryServer state (REST variant) — conversation list, message history
Firebase JS SDK (firebase/firestore, firebase/auth)Real-time listeners and auth (Firestore variant) — see React SPA with Firebase
ZustandClient-only chat UI state — composer draft, scroll anchor, sidebar open/closed

The chosen transport decides which row drives server state. The REST variant uses TanStack Query against a Cloud Run endpoint; the Firestore-doc-as-stream variant uses onSnapshot listeners. Zustand sits on top in either case for composer draft, autoscroll anchor, and launcher panel state.

note

TanStack Virtual is under evaluation for chat-specific scroll management. Its anchorTo: 'end', followOnAppend, and isAtEnd() primitives can replace the scroll anchor flags currently held in Zustand, leaving Zustand to own only composer draft and sidebar state. See the TanStack Virtual chat primitives blog post for the integration approach.

AI

LibraryRole
Vercel AI SDK (ai, @ai-sdk/react)Provider-agnostic streaming, useChat, UI Message Streams over SSE
AI Elementsshadcn-style chat UI components — Conversation, Message, PromptInput, Response, Tool, Sources, Branch
assistant-uiAlternative — headless Thread / Composer / BranchPicker; pairs well with LangGraph
CopilotKitAlternative — host-state-aware copilots (useCopilotAction, useCopilotReadable)
TanStack AIAlternative — framework-agnostic, provider-agnostic SDK with per-model type safety; built on AG-UI protocol
Firebase GenkitServer-side orchestration on Cloud Run — flows, tools, traces
react-markdown + remark-gfm + rehype-sanitize + streamdownStreaming-aware markdown rendering on the custom path
shiki or highlight.jsCode-block syntax highlighting
tip

Pick by fit. Vercel AI SDK + AI Elements is the default — provider-agnostic, source-distributed, Tailwind-native. Reach for assistant-ui when the UX needs first-class multi-thread state or branch pickers. Reach for CopilotKit when the chat must read or mutate host-app state through declared actions and readables. Reach for TanStack AI when provider portability and per-model type safety matter more than Vercel-ecosystem integration. Microsoft Fluent UI Copilot only on Microsoft-aligned projects. Drop to fully custom only when Shadow-DOM isolation or sub-100 KB bundle targets force the issue. See AI Tooling Triage for full evaluations.

Routing & Forms

Same as React SPA on Cloud Run: React Router, React Hook Form, Zod, date-fns, Sonner.

Build & Testing

LibraryRole
VitestUnit and integration testing
PlaywrightEnd-to-end testing
Vite library modeEmbedded-widget build target — produces ESM and UMD
tip

All core choices align with the Recommended Tech Stack. See individual pages for rationale and alternatives.

Architecture Overview

There are three architectures depending on which axis you pick. All three reuse infrastructure already covered by the parent recipes — only the chat-specific layer is new.

Standalone over REST + SSE

The client opens a fetch POST to /api/chat with the auth bearer attached. The backend forwards the prompt to the LLM provider and pipes the SSE stream back to the browser unchanged. The SPA renders tokens as they arrive and persists both the user and assistant messages to Firestore for history. This extends React SPA on Cloud Run — same SPA hosting, same auth, with one new streaming endpoint.

Standalone over Firestore-doc-as-stream

Firestore commit latency is roughly 100–300 ms per write, so chunks are batched at sentence or paragraph boundaries rather than per token. The trade-off: not true token-level streaming, but no SSE plumbing, real-time fan-out to multiple devices for free, and offline-ready by default. This extends React SPA with Firebase — same Eventarc wiring, just a longer-lived handler that streams to Firestore instead of writing once.

Embedded widget mounting

There are three structural options. A cross-origin iframe is the default for third-party hosts — total isolation, the host only needs frame-src in its CSP. A Web Component plus Shadow DOM keeps CSS and DOM isolated while sharing the JS realm — bundle weight is roughly 150–400 KB gzipped, or 50–120 KB with Preact-in-Shadow. An inline mount is first-party-only and offers no isolation. The chat backend is identical to the standalone variants; only mounting and identity differ.

Key Patterns

AI SDK Chat Loop. On the client, useChat from @ai-sdk/react is configured with new DefaultChatTransport({ api: '/api/chat' }); on the server, the route returns result.toUIMessageStreamResponse(). Typed event parts — text-delta, tool-call, tool-result, reasoning, source, finish — replace hand-rolled SSE parsing on the client. The custom path uses fetch plus a ReadableStream reader and a small SSE event parser written in-house — that is exactly what the SDK does internally.

SSE Streaming with Authenticated Requests. The native EventSource API cannot set custom headers, which rules it out for any chat endpoint that requires a bearer token. Production chat clients use fetch POST with a ReadableStream body reader and parse the SSE framing (data: lines, \n\n event boundaries) themselves. On Cloud Run set --no-cpu-throttling so streaming responses are not paused between chunks. Real-Time Features carry a 2× multiplier in the Complexity Factors.

Conversation Persistence Schema. A conversations/{conversationId} document holds title, owner UID, created and updated timestamps. A conversations/{conversationId}/messages/{messageId} subcollection holds each turn with role, content parts, status, tool calls, sources, and token counters. The same schema works for both transports — only the writer differs. The REST variant has the client write user messages and the backend write assistant messages; the Eventarc variant has Cloud Run own all assistant writes. Server state at this scale carries a 2–3× multiplier per the Complexity Factors.

Tool-Call UI. AI Elements ships a Tool component that renders the call-arguments-result lifecycle for any tool. For custom UIs, model each tool call as a state machine — pending, running, complete, errored — and render a bespoke component per tool name. Each new tool surface adds 3–8 dev-days per the Complexity Factors.

Cost Guardrails & Rate Limiting. Per-user, per-day token budgets must be enforced server-side. A common pattern is an atomic Firestore counter at usage/{uid}/{yyyymmdd} incremented inside the chat handler, or a Redis counter for higher throughput. Without this, a single malicious or buggy client can run thousands of dollars of provider bills before anyone notices — an OWASP LLM Top-10 risk.

Prompt Injection & Content Moderation. Treat all retrieved or uploaded content — RAG passages, file uploads, chat history fetched from a different user — as data, never as instructions. The system prompt is the only privileged channel. Run inbound user input and outbound assistant output through a moderation API (OpenAI Moderation or Google Safe Content Classifier) before persisting or rendering.

caution

Prompt injection is OWASP LLM Top-10 risk #1. See OWASP Top 10 for LLM Applications and Security overview.

Accessibility for Streaming. A naive aria-live="polite" region announcing every token re-reads the partial sentence on every update — unusable with a screen reader. Buffer the streamed text and push to the live region only at sentence boundaries. Screen-reader-friendly streaming adds +10–20% ("Screen reader optimization") per the Complexity Factors.

Embedded Widget Mounting & Theming. React mounts inside a shadow root via createRoot(shadowRootDiv); Tailwind needs shadow-root-aware stylesheet injection — react-shadow or react-shadow-scope are the common helpers. Radix UI portals — used by shadcn/ui Dialog, Popover, Tooltip — escape the shadow root by default; override the container prop so the portal mounts inside the shadow root, otherwise floating UI will leak into the host page's CSS scope. Most prebuilt SDKs are not Shadow-DOM-tested. For strict isolation prefer iframe or a fully custom widget; swap React for Preact (preact/compat) to recover roughly 40 KB.

caution

For third-party-host embeds, default to iframe. Cross-origin isolation is guaranteed without per-CSS-quirk debugging.

Authentication Patterns. Standalone is the same as the parent recipes — Firebase Auth or Identity Platform. Embedded with host SSO: the host page calls window.AlizChat.identify({ userId, jwt }) after its user signs in; the widget attaches that JWT to chat requests; the backend verifies it against the host's JWKS. Embedded anonymous-then-upgrade: the widget mints an anonymous Firebase Auth session for guest visitors, and on host login the host hands a custom token via postMessage and the widget calls signInWithCustomToken. Enterprise OIDC token-exchange counts as "SSO / SAML / OIDC", which carries a 2–3× multiplier per the Complexity Factors.

Security boundary

Provider keys live only on the backend — never in the SPA bundle, never in a widget bundle. The chat endpoint is the security perimeter: auth verification, rate limits, moderation, and prompt-injection mitigation all happen there. Frontend guards improve UX but never substitute for server-side enforcement. See Security overview.

Task Breakdown

The four quadrants share most of their work. The table below lists each epic once, with separate columns for the two implementation paths and an additive delta column for the embedded shape.

EpicStandalone + Custom (dev-days)Standalone + AI SDK (dev-days)Embedded delta (dev-days)
Project Setup & Tooling2–32–3+1–2
Authentication3–53–5+2–4
App Shell / Launcher2–42–4+2–4
Conversation Persistence3–53–5
Backend Chat Endpoint3–53–5
Streaming Plumbing3–51–2
Chat UI Primitives (message list, composer, autoscroll, markdown, code blocks)6–101–2
Tool-Call UI3–51–2+0–2 (per host-state action)
Stop / Regenerate / Edit-Resend / Branching3–51–3
Attachments & Multimodal Input3–62–4
Cost Guardrails & Rate Limiting2–42–4
Moderation & Prompt-Injection Hardening2–42–4
Accessibility (streaming live region, keyboard)2–32–3
Conversation Sidebar & History UX3–52–4n/a (replaced by launcher panel)
Widget Build & Mounting+4–8 (Vite library mode, Shadow DOM or iframe, embed snippet, host CSP)
Testing3–63–5+1–2
Deployment & CI/CD2–42–4+1–2 (CDN + version pinning)

Ballpark Totals

ComboTotal effortDuration (1 developer)Duration (2 developers)
Standalone + Prebuilt SDK30–55 dev-days6–11 weeks4–6 weeks
Standalone + Custom45–80 dev-days9–16 weeks5–9 weeks
Embedded + Prebuilt SDK (iframe)35–65 dev-days7–13 weeks4–7 weeks
Embedded + Custom (Shadow DOM)55–95 dev-days11–19 weeks6–11 weeks

Ranges baseline against React SPA on Cloud Run's 25–45 dev-day estimate; the chat layer adds the new epics above. Apply the combining-factors rule — ×1.2 for two significant complexity multipliers stacked, ×1.3–1.5 for three or more.

caution

These are baseline estimates for a chat experience with standard scope. Apply complexity multipliers for voice input, RAG, multi-language UI, strict accessibility, branching threads, or many tool surfaces. Always present estimates as ranges.

What's Not Included

  • Model selection, prompt design, and prompt-engineering work — see Prompt Engineering
  • RAG indexing and retrieval pipeline (vector database, embeddings, chunking, ingestion jobs) — separate recipe scope
  • Fine-tuning, LoRA adapters, and custom-model hosting
  • Voice input/output (Web Speech API, Whisper, TTS) — 1.5–3× per the Complexity Factors, scope as a separate epic
  • Evaluation harnesses, prompt A/B testing, golden-set regression suites
  • Multi-agent orchestration — see Multi-Agent
  • UX/UI design and Figma work
  • Internationalization — add +10–20% if needed
  • WCAG accessibility audit — add +15–30%
  • Project management overhead (meetings, demos) — typically +20–30% (see Common Pitfalls)

Deployment Overview

Deployment depends on the deployment shape. Both shapes share the chat backend.

Standalone

SPA hosting follows the parent recipes — Firebase Hosting for the Firestore variant, or Cloud Run + nginx for the REST variant. Nothing new.

The chat backend is a separate Cloud Run service. Set --no-cpu-throttling so streaming responses are not paused between tokens, and raise the request timeout (default 5 minutes; up to 60 minutes on the second-generation execution environment) to cover long completions and tool loops. For the Firestore variant the streaming worker is the same Eventarc-triggered Cloud Run service described in React SPA with Firebase — same wiring, just a longer-lived handler that writes chunks rather than a single result.

Embedded widget

Build with Vite library mode producing both ESM (modern bundlers) and UMD (<script> drop-in) artefacts. Serve from a versioned CDN path — Cloud Storage + Cloud CDN, or Firebase Hosting. Pin host pages to a specific version (widget.v1.2.3.js) and publish a latest alias for opt-in rolling updates.

The embed integration is a single <script> tag pointing at the UMD bundle plus a small init call exposing window.AlizChat with mount(), identify({ userId, jwt }), open(), and close(). The host page calls identify after its user signs in; the widget attaches the JWT to chat requests.

Host CSP requirements are bounded — script-src for the loader CDN, connect-src for the chat API origin, and frame-src if you ship the iframe variant. Provider API keys never leave the backend, so the host CSP only needs to allow your own domains.

tip

For third-party host embeds, default to the iframe variant. CSS isolation is guaranteed and the host only needs frame-src plus the loader CDN.

Further Reading

Internal docs:

External resources: