Skip to main content

We Accidentally Built an LLM Wiki ๐Ÿ“š

ยท 9 min read
Gergely Sipos
Frontend Architect

In a recent blog post we explored Karpathy's LLM Wiki pattern โ€” the idea that an LLM should build a persistent, interlinked wiki instead of re-deriving knowledge from scratch on every query. A few days later, it clicked: Aliz Web Hub โ€” a Docusaurus site where AI agents contribute documentation via GitHub PRs โ€” is essentially the same pattern with different tooling. The project predates Karpathy's gist. The parallel was noticed after, not before. The pieces map surprisingly cleanly.

The Pattern, Brieflyโ€‹

The core tension behind the LLM Wiki is that RAG re-derives knowledge on every query. You chunk documents, embed them, retrieve fragments, and ask the model to synthesize an answer โ€” from scratch, every single time. Nothing compounds. The LLM Wiki flips this: the model reads sources, compiles them into persistent interlinked markdown, and keeps the wiki current as new sources arrive. The knowledge is built once and maintained, not reconstructed per question.

As Karpathy puts it: "The tedious part of maintaining a knowledge base is not the reading or the thinking โ€” it's the bookkeeping."

The architecture has three layers: raw sources (your documents), the wiki itself (LLM-generated markdown), and the schema (an instruction file like CLAUDE.md that tells the agent how to structure everything). If you've read the gist, this is familiar. If you haven't, it's worth the five minutes.

The Mappingโ€‹

Here's how the two systems line up.

DimensionKarpathy's LLM WikiAliz Web Hub
FrontendObsidian (local, graph view)Docusaurus (deployed website, sidebar, search)
StorageLocal files + CLI agentGitHub repo + PR workflow
AI AgentClaude Code (single agent, CLAUDE.md)GitHub Copilot coding agent (multi-agent, .github/agents/*.md)
Knowledge expansionAgent reads sources, updates wikiDescribe what's needed in an issue โ†’ agent opens a PR
AudienceSingle user (personal)Team-accessible (deployed)
Hallucination defenseManual review in ObsidianPR review + CI build

The dimensions differ, but the shape is the same. Both are AI-maintained, markdown-based knowledge bases where an agent handles the structural bookkeeping that humans won't do consistently.

Frontend โ€” Obsidian vs. Docusaurusโ€‹

Both are markdown-based knowledge bases with folder structures. Obsidian is local, desktop-first, built for personal knowledge management โ€” the graph view is its killer feature. Docusaurus is a static site generator โ€” it compiles markdown into a deployed website with sidebar navigation, full-text search, and versioning. Different tools, same fundamental idea: a structured collection of interlinked markdown files.

Karpathy frames it neatly: "Obsidian is the IDE; the LLM is the programmer; the wiki is the codebase." For Aliz Web Hub, Docusaurus plays the same role. And the "compiled vs. interpreted" metaphor from the gist is literally true here โ€” Docusaurus compiles markdown into static HTML, so the wiki is a compiled artifact.

The key property both share: plain markdown in a folder. Readable by humans, writable by AI agents. No proprietary format, no database, no opaque storage layer. That's what makes the pattern work.

Storage โ€” Local Files vs. GitHubโ€‹

Karpathy's setup uses local files with Claude Code running as a CLI agent. Aliz Web Hub uses a GitHub repository. This is a small difference in mechanics but a meaningful one in contribution model.

With GitHub, contributors โ€” human or AI โ€” don't need to clone anything locally. GitHub's Copilot coding agent can be assigned an issue and work entirely in the cloud. It reads the repo, makes changes on a branch, and opens a pull request. The PR workflow creates a natural review checkpoint that doesn't exist in the local-file model.

Instruction files map directly: Karpathy uses CLAUDE.md to tell Claude Code how the wiki should be structured. Aliz Web Hub uses .github/agents/*.md files โ€” one per agent role โ€” to give each agent its specific context and constraints. Same concept, different granularity.

AI Agents โ€” Single vs. Multi-Agentโ€‹

Karpathy runs a single agent: Claude Code, working locally, guided by one CLAUDE.md file. Aliz Web Hub runs a multi-agent pipeline through GitHub's Copilot coding agent with four custom sub-agents โ€” an orchestrator, a researcher, an architect, and a writer. The orchestrator decomposes the task; each sub-agent handles a different phase.

The pipeline looks like: issue โ†’ orchestrator โ†’ researcher โ†’ architect โ†’ writer โ†’ PR โ†’ human review.

Honest assessment: it's still being fine-tuned. The orchestrator occasionally over-decomposes simple tasks into too many steps. The researcher sometimes returns more context than the downstream agents can usefully consume. Multi-agent orchestration adds real coordination overhead. But for a team-maintained documentation project where consistency, research quality, and structural correctness all matter, the specialization pays off in ways a single generalist agent doesn't match.

For personal knowledge management, a single agent is almost certainly the right call. For a team project with multiple content types, style conventions, and a CI pipeline, the multi-agent approach earns its complexity.

Knowledge Expansion โ€” The Bookkeeping Problemโ€‹

This is the heart of both systems. The bottleneck in maintaining a knowledge base has never been reading, thinking, or writing โ€” it's the bookkeeping. Filing new content in the right place, updating the sidebar, setting frontmatter, fixing cross-references, matching the existing tone, creating category files. The kind of work that's easy to describe and tedious to execute. Both systems delegate this entirely to AI agents.

In Aliz Web Hub, the workflow is: describe the content you need in a GitHub issue, assign it to the Copilot coding agent, and the agent pipeline produces a PR. The agents handle sidebar positioning, _category_.json files, YAML frontmatter, internal cross-references, and tone matching against existing content. A human reviews and merges. The knowledge compounds; the bookkeeping is automated.

Karpathy's quote bears repeating: "The tedious part of maintaining a knowledge base is not the reading or the thinking โ€” it's the bookkeeping." Both systems take that insight and build an architecture around it.

Where They Divergeโ€‹

The mapping is clean, but the differences aren't just cosmetic โ€” they reflect genuine design trade-offs.

caution

Both systems face the risk of persistent hallucinations. When an LLM hallucinates during RAG, the error is ephemeral โ€” it disappears after the conversation. When an LLM writes a factual error into a wiki page, it gets baked in and can influence future synthesis. Aliz Web Hub adds a structural defense: the PR review step and CI build (which catches broken links and build errors). It's not foolproof โ€” a confident-sounding but wrong claim can sail through review โ€” but it adds a checkpoint that the local-file model doesn't have.

The audience difference drives more design decisions than you'd expect. Karpathy's wiki serves one person. It doesn't need consistent tone, beginner-friendly navigation, search indexing, or a deployment pipeline. Aliz Web Hub serves a team. It needs all of those things โ€” and the multi-agent pipeline exists specifically to maintain that consistency across contributions from different agents and humans. Different problems, different tools.

The Three Layers, Mappedโ€‹

The LLM Wiki's three-layer architecture maps directly onto Aliz Web Hub:

LayerLLM WikiAliz Web Hub
Raw sourcesCurated documents in sources/GitHub issues, research briefs, external references
The wikiLLM-generated markdown in wiki/Docs and blog posts in web-hub-website/
The schemaCLAUDE.md.github/agents/*.md

The operations map too. Ingest = an AI agent generates content via a pull request. Query = browse the deployed Docusaurus site (or search it). Lint = the CI build catches broken links, invalid frontmatter, and build failures. Same conceptual operations, different implementation surface.

Markdown Is the APIโ€‹

Both systems chose plain markdown over proprietary formats. This isn't a coincidence. Markdown files in a Git repo is the simplest possible interface for AI agents โ€” no SDK, no database client, no API authentication. The agent reads a file, writes a file, and commits the change. The file format is the API. Every layer of sophistication you add on top โ€” Obsidian's graph view, Docusaurus's sidebar, GitHub's PR workflow โ€” is optional tooling over a format that's been stable for a decade. That stability is what makes the whole pattern viable.

What We're Still Figuring Outโ€‹

This would be dishonest without the gaps.

  • Multi-agent pipeline accuracy. The pipeline doesn't always get it right on first pass. Agent-generated PRs regularly need revision โ€” sometimes structural, sometimes factual. We're still iterating on the instruction files that govern each agent's behavior.
  • Scaling. As the docs grow, the context-window bottleneck becomes real. An agent needs to understand the existing site structure to place new content correctly, and that structural context grows with every page. At some point, we may need RAG on top of the wiki โ€” the same irony Karpathy flags in his gist.
  • Quality calibration. Getting consistent tone and depth from agents across different content types (tutorials vs. reference pages vs. blog posts) is genuinely hard. The instruction files help, but they're not a substitute for editorial judgment.
  • Source attribution. Tracing agent-generated content back to the research sources that informed it isn't systematic yet. The researcher agent gathers sources, but the chain from source โ†’ research brief โ†’ final content isn't always visible in the finished artifact.
note

This post itself was produced using the multi-agent pipeline described above. An orchestrator decomposed the task, a researcher gathered context, an architect wrote the content plan, and a writer produced the draft. A human reviewed and merged the PR.

Further Readingโ€‹