Skip to main content

The LLM Wiki Now Has a Formal Spec โ€” OKF v0.1 ๐Ÿ“

ยท 5 min read
Gergely Sipos
Frontend Architect

The LLM Wiki pattern โ€” previously just a gist from Karpathy and a growing community convention โ€” now has a formal, versioned specification. On June 12, Google Cloud's Data Cloud team published OKF (Open Knowledge Format) v0.1, a vendor-neutral spec for representing curated knowledge for AI systems. This is the fourth post in our LLM Wiki series (previous posts: the pattern, our wiki mapping, local models), and arguably the most significant development since the idea first gained traction.

What Is OKF?โ€‹

OKF is an open specification for representing metadata, context, and curated knowledge in a format that AI systems can consume reliably. Technically, it's a directory of markdown files with YAML frontmatter and a small set of conventions. That's it.

The spec was published by Sam McVeety and Amir Hormati โ€” both tech leads in BigQuery/Data Analytics Engineering at Google. Despite the Google Cloud origin, the format is explicitly vendor-neutral. There's no proprietary schema, no required runtime, no SDK dependency. As the spec states: "No complex compression scheme, no new runtime, no required SDK."

The only mandatory field is type in the YAML frontmatter. Everything else is optional. You can start with a single markdown file that has type: concept in its frontmatter and call it a valid OKF bundle. Sophistication is additive, not required.

OKF is versioned (v0.1), designed for backward-compatible growth.

Bundle Structureโ€‹

An OKF bundle is a directory with a specific anatomy:

bundle/
โ”œโ”€โ”€ index.md # Optional directory listing
โ”œโ”€โ”€ log.md # Optional chronological history
โ”œโ”€โ”€ <concept>.md # Concept documents
โ””โ”€โ”€ <subdirectory>/
โ”œโ”€โ”€ index.md
โ””โ”€โ”€ <concept>.md
  • index.md โ€” A directory listing that enables progressive disclosure. Agents can read the index first, then decide which concept documents to fetch.
  • log.md โ€” A chronological update history. Useful for agents that need to understand what changed recently.
  • Concept documents โ€” Individual knowledge nodes. Each has YAML frontmatter (must include type) plus a markdown body.

Recommended frontmatter fields beyond type: title, description, resource, tags, timestamp. Cross-linking between documents uses standard markdown links with bundle-relative paths. Consumers must tolerate broken links โ€” a pragmatic acknowledgment that bundles evolve and links rot.

The conformance rule is simple: every non-reserved .md file in the bundle must have parseable YAML frontmatter with a non-empty type field. If it does, the bundle is valid.

What Ships with the Specโ€‹

Three things ship alongside the specification itself:

  • Enrichment agent โ€” Built on Google's Agent Development Kit (ADK) with Gemini. It uses a two-pass architecture: first walks a BigQuery dataset's metadata, then enriches concepts by crawling documentation and adding citations, schemas, and join paths. Supports --no-web for air-gapped environments.
  • Static HTML visualizer โ€” Converts any OKF bundle into an interactive force-directed graph view using Cytoscape.js. Single HTML file, no backend, no data leaves the page.
  • Sample bundles โ€” Three reference bundles built from BigQuery public datasets: GA4 e-commerce, Stack Overflow, and Bitcoin.

Additionally, Google Cloud's Knowledge Catalog has been updated to ingest OKF as a first-class format.

Why This Mattersโ€‹

Interoperability. Until now, every LLM Wiki implementation was bespoke. My bundle had different conventions than yours. OKF gives a shared target that tools can agree on.

Tooling ecosystem. A spec enables tooling from multiple vendors: validators, enrichment agents, visualizers, importers, exporters. One team builds a validator, everyone benefits. That only works when there's agreement on what "valid" means.

From pattern to standard. Karpathy described a pattern โ€” the idea that you should maintain curated markdown knowledge for LLMs. OKF formalizes the data format that implements that pattern. Community projects have already begun citing OKF compliance.

Format, not platform. OKF isn't tied to any cloud, model provider, or agent framework. Its value comes from how many parties speak it โ€” the same network effect that made JSON useful.

BigQuery as first source. It's notable that the first enrichment target is structured data (dataset metadata), not documents. This signals that the team sees OKF as a bridge between structured systems and LLM consumption โ€” not just a documentation format.

How OKF Relates to Our Setupโ€‹

If your team's setup resembles ours, you're already close to OKF compliance. Aliz Web Hub's existing structure โ€” Docusaurus markdown with YAML frontmatter organized in a folder hierarchy โ€” is conceptually identical to an OKF bundle.

The key additions OKF formalizes are:

  1. The mandatory type field in frontmatter
  2. Bundle conventions (index.md for progressive disclosure, log.md for history)
  3. Cross-linking rules and tolerance for broken references

For teams already running LLM Wiki workflows, adopting OKF conventions is low-friction. The foundation is the same: markdown, YAML, directories.

An honest assessment: v0.1 is early. The spec is short, the tooling is reference-quality, and the ecosystem hasn't had time to validate which conventions survive contact with diverse real-world usage. For most teams, the right move is to watch closely and experiment if you're already invested in the LLM Wiki pattern. If you're starting fresh, aligning with OKF from day one costs nothing and positions you for whatever tooling emerges.

Further Readingโ€‹