Inside Nemetschek's Multi-Agent Copilot Setup 🤖🧩

April 22, 2026 · 12 min read

Frontend Architect

When your product is an AI assistant, using AI to build it feels natural — but doing it well is harder than it sounds. The AI-Assisted Development section describes these patterns in the abstract. This post is what they look like after a year in production on a real Aliz frontend: a React + TypeScript chat-based AI assistant with theming, 18-language internationalization, MCP integrations, and multi-environment deploys. The codebase is large enough that no single prompt can reason about it coherently, which is the whole reason the team stopped reaching for a tool and started building a system — the same shape described in Multi-Agent Orchestration. Three layers of AI setup, a team of specialist agents, and a workflow called QRSPI hold it together.

From Ad-Hoc to a Pipeline

A year ago the team was using Copilot the way most teams do: chat window open, paste in some context, argue with the model, ship the diff. That worked fine for isolated changes and fell apart on anything that touched more than two or three files. The failure mode was predictable. A single prompt would lose the plot halfway through a feature. Context evaporated between sessions. Two developers asking the model the same question would get two different answers, both plausibly wrong.

The fix wasn't a better prompt. It was structure. The current setup runs on GitHub Copilot's custom agent modes and reshapes day-to-day AI use into a pipeline with defined roles, defined handoffs, and defined tool access — the progression described in AI-Assisted Development, from autocomplete to chat to agents to multi-agent pipelines.

Three Layers of AI-Assisted Coding

Everything below sits on top of a clean separation of concerns between three layers.

Layer	Scope	Invoked	Purpose
Copilot Instructions	Always-on	Every Copilot chat, automatically	Global conventions and style rules
Custom Agent Modes	Per-workflow	Developer opens a specific agent	Multi-step, tool-restricted pipelines
Prompt Files	Per-invocation	Developer runs a named prompt	One-shot reusable tasks

Instructions define how the AI should behave globally. Agents define what the AI does for specific workflows. Prompt files handle recurring tasks that don't need a whole pipeline. Each layer is independently maintainable, and none of them try to carry work the others should be doing.

Always-On Instructions

A shared repo-level instruction file is inherited by every Copilot chat in the project. It encodes the conventions that otherwise drift across contributors: arrow-function components, no TypeScript enums, TailwindCSS without arbitrary values, currentColor in SVGs so theming keeps working. Scoped instructions kick in for specific file types — documentation files get their own formatting and linking rules, for example.

The developer doesn't see this layer. It just is. New Copilot chat, conventions already active, no setup step. This is exactly what Prompt Engineering → Workspace Instruction Files recommends, and it's the cheapest lift with the widest blast radius.

The Agent Team Model

Above the always-on layer sit three teams — Development, Documentation, Translation — plus a standalone Estimator.

Each team has a single orchestrator that the developer actually talks to. The orchestrator receives the request, asks clarifying questions, and then delegates work to specialists in a defined sequence. Specialists never talk to the developer directly. All communication flows through the orchestrator. This is the Orchestrator + Subagents pattern applied in earnest: developer UX stays simple (one conversation) while the pipeline underneath can get as sophisticated as it needs to.

The Development Team — QRSPI

The Development Team is seven agents running the QRSPI workflow: Questions → Research → design → Plan → Implement. (The design step lives inside the Planner's three modes, which is why the acronym reads the way it does.) Each handoff is a structured input to the next stage, in the shape described in Specialist Handoff / Pipeline.

Agent	Role	Tool access
Dev Team (Orchestrator)	Receives request, coordinates pipeline	Delegation only
Dev Planner	Questions, design discussion, implementation plan	Read-only
Dev Researcher	Objective codebase exploration	Read-only
Dev Developer	Implements the plan	Edit + execute
Dev Unit Test Writer	Writes Vitest unit tests	Edit + execute
Dev Visual Test Writer	Storybook stories + visual regression	Edit + execute + Figma MCP
Dev Reviewer	Correctness + convention compliance	Read + lint only

Why Research Is Separated from Planning

This is the single highest-leverage design decision in the setup.

The Dev Researcher never sees the feature description, the ticket, or the implementation intent. It receives only research questions — "Where is theming state held?", "How are MCP tool results rendered?" — generated by the Dev Planner from the original feature request with the intent stripped out.

The reason is blunt: when a language model knows what you're building, its "research" becomes confirmation-biased. It starts looking for things that support the approach it's already quietly imagining. Keep the Researcher objective — restrict it to documenting what exists, how it works, and where it lives — and the facts it produces are cleaner. The Planner then reasons over those facts instead of over its own earlier guesses. This is the AI Coding Guidelines context quality determines output quality principle, flipped around: sometimes the most valuable thing you can do for context quality is withhold intent.

tip

You don't need a seven-agent team to benefit from this. Even in a two-agent or solo-with-prompts setup, running a pure "investigate the codebase, answer these questions, don't propose a solution" pass before any design discussion is transferable and cheap. The separation is the idea; the org chart is the implementation.

Interactive Design Before Planning

The Dev Planner runs in three invoked-separately modes:

Question Generation — turns the feature description into 5–12 research questions for the Researcher.
Interactive Design — presents its understanding of the problem, surfaces open questions and tradeoffs, and works through them with the developer before committing to anything.
Implementation Planning — only after design is settled, produces a phased, vertical implementation plan where each phase is independently testable.

The design step is mandatory for new features, and it is the highest-leverage review point in the pipeline. In the team's own words: "catching a bad design decision in a 200-line design document is far more efficient than catching it in a 1000-line plan or after the code is written."

Seen through the lens of AI Coding Agents, this is plan → act → observe with the human-in-the-loop checkpoint placed deliberately at the cheapest-to-fix stage, not the most expensive one. Design review before implementation review. It's the difference between arguing about an approach and arguing about a diff.

Vertical Phases Over Horizontal Layers

Implementation plans are broken into vertical phases — end-to-end slices of functionality that each produce something testable on their own — rather than horizontal layers (all types first, then all components, then all tests). Horizontal planning looks tidier on paper and is almost always worse in practice: nothing works until the last layer lands, and the early phases can't be validated against reality.

The Trivial Change Shortcut

Not every ticket deserves QRSPI. For single-file bug fixes or small adjustments with no real design decisions, the Dev Team orchestrator can skip straight to plan + implement. Structured, not rigid.

note

Ceremony should scale with risk. A pipeline that forces full research-and-design on a one-line CSS fix will get bypassed entirely within a week. Building the shortcut in deliberately — as an explicit mode, not an accidental escape hatch — keeps the default path honest.

The Documentation Team

Four agents: orchestrator, researcher (web + codebase), architect (structure and review), writer (Markdown). The pipeline is research → plan → write → review, and the architect closes the loop by reviewing the finished content against the original structure. This is the same shape the Web Hub itself is written with — see Introducing Our AI-Assisted Development Docs.

The Translation Team

18 target languages, English as the single source of truth. The Translation orchestrator reads the English file, diffs its key structure against every target file, identifies missing and stale keys, and delegates per-language translation to a Translator specialist. After each language comes back, the orchestrator validates: JSON parses, all keys present, no stale keys, locale-identifier values (non-translatable) untouched.

The load-bearing piece is a CI test that enforces exact key parity with the English source. The automation rides on top of a deterministic guardrail — the AI does the translation, the test catches anything the AI got structurally wrong. Neither half would work alone.

The Estimator

A standalone agent, outside the three teams. Given a feature, it researches the actual code before estimating — never guesses from file names. It decomposes the work into atomic subtasks, assesses complexity, and produces effort ranges, never single-number point estimates. Most of its output value is in the risks and assumptions it surfaces alongside the numbers.

The Principle of Least Privilege

Tool access is scoped tightly per agent:

Planners and researchers: read-only
Reviewers: read + lint, no edits
Developers: edit + execute, no test authoring
Test writers: edit + execute, no design tools

The safety argument is real — a read-only planner can't accidentally trash the repo — but it isn't the main argument. The main argument is that tool access defines the role. A planner that can also write code is tempted to skip planning. A reviewer that can edit files is tempted to fix issues instead of reporting them clearly. Restricting tools keeps each agent focused on its role. This aligns with AI Coding Agents → Human-in-the-Loop and the least-privilege caution in MCP Servers.

caution

Tool scope is role definition. An agent with more reach than its role drifts — quietly, and usually toward whatever tool is easiest to reach for. If you find yourself adding "please don't edit files" to an agent's prompt, take the edit tool away instead.

MCP Integrations

MCP servers are opt-in per agent, with graceful degradation when they're not available:

ESLint MCP — Dev Developer and Dev Reviewer get lint feedback directly, without shelling out.
Figma MCP — Dev Developer and Dev Visual Test Writer reference design files during implementation.
GitHub MCP — powers PR dashboards and repo queries.

The general shape matches MCP Servers: one server per integration, tool access gated per agent, and no agent gets an MCP server it doesn't actually need for its role.

Reusable Prompt Files

Two prompt files earn their keep independently of the agent teams.

PR Dashboard. A personalized per-developer view: authored PRs, review requests, CI status, suggested priorities. Runs against the GitHub MCP server.

Agentic E2E Testing. Test scenarios written as plain Markdown user journeys — "Navigate to the app, accept the EULA, verify the chat interface appears." A prompt reads the scenario and drives a real browser through the Chrome DevTools MCP server: navigating, clicking, screenshotting, validating. Because scenarios are high-level, there are no brittle CSS selectors to maintain; the agent translates intent into interactions and adapts when the DOM shifts. Screenshots are captured on anomalies. It's experimental, but already useful for smoke and regression runs. It's also a good advertisement for what MCP Servers enable when you combine them with specific-purpose prompts.

Agent Context Files: Persistent Memory Across Sessions

Language models don't retain state between sessions. The team's workaround is context files: per-feature curated folders containing requirements, architecture decisions, known issues, and current-state assessments. Agents read the entire folder first on session start. The files are kept in sync as work progresses — resolved issues marked, new decisions added, scope changes reflected.

Functionally, context files are the per-feature analogue of .github/copilot-instructions.md: a persistent, curated external memory the agents can reload into context on demand. AI Coding Agents → Memory treats this kind of external memory as the thing that separates "agent that starts cold every time" from "agent that remembers why your codebase looks the way it does."

How This Maps to the Web Hub Playbook

Nemetschek construct	Web Hub doc
Copilot Instructions layer	Workspace Instruction Files
Orchestrator + specialist teams	Orchestrator + Subagents
QRSPI handoff chain	Specialist Handoff / Pipeline
Design-first checkpoint	AI Coding Agents (plan → act → observe)
Per-agent tool scoping	MCP Servers (least privilege)
Context files	AI Coding Agents → Memory

The team didn't invent new primitives. They committed hard to the ones the docs recommend and composed them with discipline. The result looks elaborate from the outside and feels simple from the inside — which is usually the sign that the composition is right.

Takeaways

Separate concerns across three layers. Instructions for conventions, agents for workflows, prompts for one-shots. Don't overload a single layer.
Orchestrator + specialists keeps developer UX simple while letting the underlying pipeline grow sophisticated without leaking complexity upward.
Strip intent from research to dodge confirmation bias. Highest-leverage idea in the setup, transferable to any two-agent configuration.
Design review before implementation review. A bad decision costs orders of magnitude less to catch in a design document than in a finished diff.
Least privilege keeps agents in their lane. Tool scope is role definition; extra reach causes drift.
Context files are persistent memory between sessions. Curated per-feature folders are how you stop every conversation starting cold.

If any of this resonates with your project, the building blocks are already documented — start with Multi-Agent Orchestration and MCP Servers, and wire your own orchestrator + specialists on top.

From Ad-Hoc to a Pipeline​

Three Layers of AI-Assisted Coding​

Always-On Instructions​

The Agent Team Model​

The Development Team — QRSPI​

Why Research Is Separated from Planning​

Interactive Design Before Planning​

Vertical Phases Over Horizontal Layers​

The Trivial Change Shortcut​

The Documentation Team​

The Translation Team​

The Estimator​

The Principle of Least Privilege​

MCP Integrations​

Reusable Prompt Files​

Agent Context Files: Persistent Memory Across Sessions​

How This Maps to the Web Hub Playbook​

Takeaways​

Further Reading​