I Tried the LLM Wiki Pattern with Local Models — Here's What Happened 🔒

May 7, 2026 · 7 min read

AI Solution Architect at Aliz

Gergely's post on the LLM Wiki pattern hit close to home. I've been doing something adjacent for a while — personal knowledge management with Obsidian and local LLMs — and I immediately wanted to try the pattern on my own vault. Here's what I ran into, what worked, and what didn't.

Why Local Models? Privacy.

I use Obsidian for personal knowledge management — private notes, journal entries, research threads, half-formed ideas. The kind of stuff you wouldn't paste into a cloud API. That constraint — everything stays local — shapes every decision downstream.

Obsidian is a natural fit for the wiki pattern in theory. It's local-first, everything is plain markdown, and the graph view already gives you a visual map of how your notes connect. If any tool was going to play nicely with Karpathy's vision of "Obsidian as the IDE," it should be the actual Obsidian.

Obsidian graph view showing a large knowledge graph with hundreds of interconnected nodes in colorful clusters My actual Obsidian graph — hundreds of notes, years of accumulation. The clusters are visible, but navigating them programmatically is another story.

The problem is that "local" means local models. And local models on consumer hardware are a different universe from Claude or GPT-5 via API.

The Context Window Wall

The local models I could run had a small fraction of the effective context window of a frontier cloud model. That gap matters enormously for the LLM Wiki pattern.

Think about what ingest requires: the model needs to read a new source document, understand the wiki schema, and review relevant existing wiki pages — all at once — to decide what to update, what to create, and what to cross-reference. As the wiki grows, the context requirement grows with it. With cloud-scale context windows, you can feed the model a substantial chunk of the wiki alongside the new source. On a local model, you can barely fit the source and the schema, let alone the existing pages that need updating.

The deeper issue was reasoning quality, not just window size. Local models struggled with the graph navigation problem: given a new piece of information, which existing wiki nodes are relevant? Which pages need updating? That decision requires understanding the structure and content of the wiki — a task that was consistently beyond what my local models could handle reliably.

In hindsight, part of this was on me. My tagging conventions evolved organically over years — they made sense to me but weren't structured the way an LLM agent would want to traverse them. With a schema designed up-front for graph navigation rather than human browsing, smaller models might have had a fairer shot.

note

Advertised context windows and effective context windows are not the same thing. Many local models claim 32k or even 128k tokens, but their quality degrades significantly in the back half of the context. For tasks like wiki ingest that require careful attention across the entire context, the effective window is often much smaller than the spec sheet suggests.

Falling Back to RAG

When the direct wiki approach didn't pan out, I fell back to RAG. It was the pragmatic choice — RAG outsources the "what's relevant?" question to vector similarity search, which is algorithmic and doesn't depend on the model's reasoning capacity. Even a smaller model can generate a decent answer when the retrieval step has already surfaced the right chunks.

It works. But it has exactly the limitation Gergely highlighted in the original post: no knowledge compounding. Every query re-derives the answer from scratch. There's no persistent synthesis, no cross-referencing, no accumulated understanding. My vault stays a bag of chunks rather than a connected knowledge base.

For my use case — querying personal notes — RAG was good enough for simple lookups. But it couldn't do the thing that made the LLM Wiki pattern exciting in the first place: building something over time.

The GraphRAG Experiment

After RAG, I tried a different angle: GraphRAG with Neo4j.

GraphRAG is an approach from Microsoft Research that builds a knowledge graph from your documents, then uses community detection (specifically the Leiden algorithm) to create hierarchical summaries of related content. Instead of flat vector similarity, you get structured clusters of related information with pre-computed summaries at different levels of abstraction.

The key insight: GraphRAG is more robust to model limitations because a significant part of the work — community detection, hierarchy building, and structural organization — is algorithmic, not LLM-dependent. The model still handles entity extraction and summarization, but the structural decisions that organize those entities are handled by graph algorithms that work the same whether you're running a 7B parameter model or GPT-5.

There's a philosophical alignment with the LLM Wiki pattern here. Both approaches "compile" knowledge at ingest time rather than re-deriving it at query time. GraphRAG builds a structured graph with pre-computed summaries; the LLM Wiki builds a structured wiki with pre-written pages. Different artifacts, same principle: invest upfront so queries are cheaper and richer.

Neo4j was a natural backend — it's built for exactly this kind of graph structure, and the query language (Cypher) makes it straightforward to traverse relationships.

Status: promising direction, still experimental. The graph structure captures relationships that flat RAG misses entirely, and the algorithmic community detection works regardless of model quality. But it's early days, and the tooling is still rough around the edges.

tip

If you're constrained to local models and want something closer to the LLM Wiki's "compiled knowledge" benefit, GraphRAG is worth investigating. The community detection and hierarchy steps are algorithmic, so you get structural benefits even with smaller models — though entity extraction quality will still scale with model capability.

The obsidian-cli Detour

I also tried obsidian-cli, hoping it would bridge the gap between my vault and an LLM agent. Not much success — but in fairness, this was a superficial experiment and I didn't spend a lot of time trying to make it work.

The honest takeaway: tooling wasn't the bottleneck. Obsidian vaults are plain markdown files in a folder. Any LLM agent can read and write them directly. The real constraint was always model quality — no CLI wrapper was going to fix the reasoning limitations of local models.

What I'd Do Differently

If privacy weren't a constraint, the LLM Wiki pattern would likely work well — cloud models have the context windows and reasoning capacity to handle ingest properly. But the whole point of my setup is that privacy is the constraint, so cloud is off the table.

The most promising fully-local path is GraphRAG. It offloads the hardest structural decisions to algorithms and lets the model focus on what even smaller models do reasonably well: summarization.

And models are improving fast. What doesn't work with today's local models may work in six months. The 7B and 13B models of early 2026 are dramatically better than a year ago. The pattern is sound — it's the hardware and model gap that needs to close.

Why Local Models? Privacy.​

The Context Window Wall​

Falling Back to RAG​

The GraphRAG Experiment​

The obsidian-cli Detour​

What I'd Do Differently​

Further Reading​