Inside Nemetschek's Multi-Agent Copilot Setup ๐ค๐งฉ
When your product is an AI assistant, using AI to build it feels natural โ but doing it well is harder than it sounds. The AI-Assisted Development section describes these patterns in the abstract. This post is what they look like after a year in production on a real Aliz frontend: a React + TypeScript chat-based AI assistant with theming, 18-language internationalization, MCP integrations, and multi-environment deploys. The codebase is large enough that no single prompt can reason about it coherently, which is the whole reason the team stopped reaching for a tool and started building a system โ the same shape described in Multi-Agent Orchestration. Three layers of AI setup, a team of specialist agents, and a workflow called QRSPI hold it together.
From Ad-Hoc to a Pipelineโ
A year ago the team was using Copilot the way most teams do: chat window open, paste in some context, argue with the model, ship the diff. That worked fine for isolated changes and fell apart on anything that touched more than two or three files. The failure mode was predictable. A single prompt would lose the plot halfway through a feature. Context evaporated between sessions. Two developers asking the model the same question would get two different answers, both plausibly wrong.
The fix wasn't a better prompt. It was structure. The current setup runs on GitHub Copilot's custom agent modes and reshapes day-to-day AI use into a pipeline with defined roles, defined handoffs, and defined tool access โ the progression described in AI-Assisted Development, from autocomplete to chat to agents to multi-agent pipelines.
Three Layers of AI-Assisted Codingโ
Everything below sits on top of a clean separation of concerns between three layers.
| Layer | Scope | Invoked | Purpose |
|---|---|---|---|
| Copilot Instructions | Always-on | Every Copilot chat, automatically | Global conventions and style rules |
| Custom Agent Modes | Per-workflow | Developer opens a specific agent | Multi-step, tool-restricted pipelines |
| Prompt Files | Per-invocation | Developer runs a named prompt | One-shot reusable tasks |
Instructions define how the AI should behave globally. Agents define what the AI does for specific workflows. Prompt files handle recurring tasks that don't need a whole pipeline. Each layer is independently maintainable, and none of them try to carry work the others should be doing.
Always-On Instructionsโ
A shared repo-level instruction file is inherited by every Copilot chat in the project. It encodes the conventions that otherwise drift across contributors: arrow-function components, no TypeScript enums, TailwindCSS without arbitrary values, currentColor in SVGs so theming keeps working. Scoped instructions kick in for specific file types โ documentation files get their own formatting and linking rules, for example.
The developer doesn't see this layer. It just is. New Copilot chat, conventions already active, no setup step. This is exactly what Prompt Engineering โ Workspace Instruction Files recommends, and it's the cheapest lift with the widest blast radius.
The Agent Team Modelโ
Above the always-on layer sit three teams โ Development, Documentation, Translation โ plus a standalone Estimator.
Each team has a single orchestrator that the developer actually talks to. The orchestrator receives the request, asks clarifying questions, and then delegates work to specialists in a defined sequence. Specialists never talk to the developer directly. All communication flows through the orchestrator. This is the Orchestrator + Subagents pattern applied in earnest: developer UX stays simple (one conversation) while the pipeline underneath can get as sophisticated as it needs to.
The Development Team โ QRSPIโ
The Development Team is seven agents running the QRSPI workflow: Questions โ Research โ design โ Plan โ Implement. (The design step lives inside the Planner's three modes, which is why the acronym reads the way it does.) Each handoff is a structured input to the next stage, in the shape described in Specialist Handoff / Pipeline.
| Agent | Role | Tool access |
|---|---|---|
| Dev Team (Orchestrator) | Receives request, coordinates pipeline | Delegation only |
| Dev Planner | Questions, design discussion, implementation plan | Read-only |
| Dev Researcher | Objective codebase exploration | Read-only |
| Dev Developer | Implements the plan | Edit + execute |
| Dev Unit Test Writer | Writes Vitest unit tests | Edit + execute |
| Dev Visual Test Writer | Storybook stories + visual regression | Edit + execute + Figma MCP |
| Dev Reviewer | Correctness + convention compliance | Read + lint only |
Why Research Is Separated from Planningโ
This is the single highest-leverage design decision in the setup.
The Dev Researcher never sees the feature description, the ticket, or the implementation intent. It receives only research questions โ "Where is theming state held?", "How are MCP tool results rendered?" โ generated by the Dev Planner from the original feature request with the intent stripped out.
The reason is blunt: when a language model knows what you're building, its "research" becomes confirmation-biased. It starts looking for things that support the approach it's already quietly imagining. Keep the Researcher objective โ restrict it to documenting what exists, how it works, and where it lives โ and the facts it produces are cleaner. The Planner then reasons over those facts instead of over its own earlier guesses. This is the AI Coding Guidelines context quality determines output quality principle, flipped around: sometimes the most valuable thing you can do for context quality is withhold intent.
You don't need a seven-agent team to benefit from this. Even in a two-agent or solo-with-prompts setup, running a pure "investigate the codebase, answer these questions, don't propose a solution" pass before any design discussion is transferable and cheap. The separation is the idea; the org chart is the implementation.
Interactive Design Before Planningโ
The Dev Planner runs in three invoked-separately modes:
- Question Generation โ turns the feature description into 5โ12 research questions for the Researcher.
- Interactive Design โ presents its understanding of the problem, surfaces open questions and tradeoffs, and works through them with the developer before committing to anything.
- Implementation Planning โ only after design is settled, produces a phased, vertical implementation plan where each phase is independently testable.
The design step is mandatory for new features, and it is the highest-leverage review point in the pipeline. In the team's own words: "catching a bad design decision in a 200-line design document is far more efficient than catching it in a 1000-line plan or after the code is written."
Seen through the lens of AI Coding Agents, this is plan โ act โ observe with the human-in-the-loop checkpoint placed deliberately at the cheapest-to-fix stage, not the most expensive one. Design review before implementation review. It's the difference between arguing about an approach and arguing about a diff.
Vertical Phases Over Horizontal Layersโ
Implementation plans are broken into vertical phases โ end-to-end slices of functionality that each produce something testable on their own โ rather than horizontal layers (all types first, then all components, then all tests). Horizontal planning looks tidier on paper and is almost always worse in practice: nothing works until the last layer lands, and the early phases can't be validated against reality.
The Trivial Change Shortcutโ
Not every ticket deserves QRSPI. For single-file bug fixes or small adjustments with no real design decisions, the Dev Team orchestrator can skip straight to plan + implement. Structured, not rigid.
Ceremony should scale with risk. A pipeline that forces full research-and-design on a one-line CSS fix will get bypassed entirely within a week. Building the shortcut in deliberately โ as an explicit mode, not an accidental escape hatch โ keeps the default path honest.
The Documentation Teamโ
Four agents: orchestrator, researcher (web + codebase), architect (structure and review), writer (Markdown). The pipeline is research โ plan โ write โ review, and the architect closes the loop by reviewing the finished content against the original structure. This is the same shape the Web Hub itself is written with โ see Introducing Our AI-Assisted Development Docs.
The Translation Teamโ
18 target languages, English as the single source of truth. The Translation orchestrator reads the English file, diffs its key structure against every target file, identifies missing and stale keys, and delegates per-language translation to a Translator specialist. After each language comes back, the orchestrator validates: JSON parses, all keys present, no stale keys, locale-identifier values (non-translatable) untouched.
The load-bearing piece is a CI test that enforces exact key parity with the English source. The automation rides on top of a deterministic guardrail โ the AI does the translation, the test catches anything the AI got structurally wrong. Neither half would work alone.
The Estimatorโ
A standalone agent, outside the three teams. Given a feature, it researches the actual code before estimating โ never guesses from file names. It decomposes the work into atomic subtasks, assesses complexity, and produces effort ranges, never single-number point estimates. Most of its output value is in the risks and assumptions it surfaces alongside the numbers.
The Principle of Least Privilegeโ
Tool access is scoped tightly per agent:
- Planners and researchers: read-only
- Reviewers: read + lint, no edits
- Developers: edit + execute, no test authoring
- Test writers: edit + execute, no design tools
The safety argument is real โ a read-only planner can't accidentally trash the repo โ but it isn't the main argument. The main argument is that tool access defines the role. A planner that can also write code is tempted to skip planning. A reviewer that can edit files is tempted to fix issues instead of reporting them clearly. Restricting tools keeps each agent focused on its role. This aligns with AI Coding Agents โ Human-in-the-Loop and the least-privilege caution in MCP Servers.
Tool scope is role definition. An agent with more reach than its role drifts โ quietly, and usually toward whatever tool is easiest to reach for. If you find yourself adding "please don't edit files" to an agent's prompt, take the edit tool away instead.
MCP Integrationsโ
MCP servers are opt-in per agent, with graceful degradation when they're not available:
- ESLint MCP โ Dev Developer and Dev Reviewer get lint feedback directly, without shelling out.
- Figma MCP โ Dev Developer and Dev Visual Test Writer reference design files during implementation.
- GitHub MCP โ powers PR dashboards and repo queries.
The general shape matches MCP Servers: one server per integration, tool access gated per agent, and no agent gets an MCP server it doesn't actually need for its role.
Reusable Prompt Filesโ
Two prompt files earn their keep independently of the agent teams.
PR Dashboard. A personalized per-developer view: authored PRs, review requests, CI status, suggested priorities. Runs against the GitHub MCP server.
Agentic E2E Testing. Test scenarios written as plain Markdown user journeys โ "Navigate to the app, accept the EULA, verify the chat interface appears." A prompt reads the scenario and drives a real browser through the Chrome DevTools MCP server: navigating, clicking, screenshotting, validating. Because scenarios are high-level, there are no brittle CSS selectors to maintain; the agent translates intent into interactions and adapts when the DOM shifts. Screenshots are captured on anomalies. It's experimental, but already useful for smoke and regression runs. It's also a good advertisement for what MCP Servers enable when you combine them with specific-purpose prompts.
Agent Context Files: Persistent Memory Across Sessionsโ
Language models don't retain state between sessions. The team's workaround is context files: per-feature curated folders containing requirements, architecture decisions, known issues, and current-state assessments. Agents read the entire folder first on session start. The files are kept in sync as work progresses โ resolved issues marked, new decisions added, scope changes reflected.
Functionally, context files are the per-feature analogue of .github/copilot-instructions.md: a persistent, curated external memory the agents can reload into context on demand. AI Coding Agents โ Memory treats this kind of external memory as the thing that separates "agent that starts cold every time" from "agent that remembers why your codebase looks the way it does."
How This Maps to the Web Hub Playbookโ
| Nemetschek construct | Web Hub doc |
|---|---|
| Copilot Instructions layer | Workspace Instruction Files |
| Orchestrator + specialist teams | Orchestrator + Subagents |
| QRSPI handoff chain | Specialist Handoff / Pipeline |
| Design-first checkpoint | AI Coding Agents (plan โ act โ observe) |
| Per-agent tool scoping | MCP Servers (least privilege) |
| Context files | AI Coding Agents โ Memory |
The team didn't invent new primitives. They committed hard to the ones the docs recommend and composed them with discipline. The result looks elaborate from the outside and feels simple from the inside โ which is usually the sign that the composition is right.
Takeawaysโ
- Separate concerns across three layers. Instructions for conventions, agents for workflows, prompts for one-shots. Don't overload a single layer.
- Orchestrator + specialists keeps developer UX simple while letting the underlying pipeline grow sophisticated without leaking complexity upward.
- Strip intent from research to dodge confirmation bias. Highest-leverage idea in the setup, transferable to any two-agent configuration.
- Design review before implementation review. A bad decision costs orders of magnitude less to catch in a design document than in a finished diff.
- Least privilege keeps agents in their lane. Tool scope is role definition; extra reach causes drift.
- Context files are persistent memory between sessions. Curated per-feature folders are how you stop every conversation starting cold.
If any of this resonates with your project, the building blocks are already documented โ start with Multi-Agent Orchestration and MCP Servers, and wire your own orchestrator + specialists on top.
