LLMs as knowledge compilers

For most of the last two years, the default mental model of an LLM has been: ask a question, get an answer, move on. Retrieval-augmented generation fits that same shape — the model reads some context, produces an answer, then throws the context away. Every query starts from scratch.

A different shape is emerging. Instead of treating the model as a question-answerer, treat it as a knowledge compiler — a system that takes raw material in and produces progressively more structured, persistent knowledge out.

Queries become artifacts

Under the retrieval pattern, a hard question produces an answer and nothing else. Under the compiler pattern, the same question produces a new page in an internal wiki — something that can be referenced, cross-linked, and improved.

The system gets smarter every time someone asks a non-trivial question, not just on the query that was asked but on every adjacent query that touches the same concepts.

The system improves without touching the weights

Better organization of context beats fine-tuning for most workloads. Cross-references, naming conventions, and structural discipline give you most of what fine-tuning promises, without the cost or the lock-in.

If you find yourself reaching for a custom model, check first whether you've really exhausted structure.

Eventually you need a harness

A single prompt scales to a single task. A system of prompts scales nowhere without a harness around it — something to coordinate retries, enforce schemas, route to the right tool, and preserve memory across sessions.

The harness is where the engineering actually lives. Prompt engineering became context engineering. Context engineering is becoming harness engineering. Each step moves more of the intelligence into code and less into text.

Three recent signals

Karpathy's personal knowledge wiki. A pattern for compiling documents into a structured wiki via an LLM, not as application code but as a shared idea.

The Claude Code leak. Forty permission-gated tools, a three-layer memory system, a 46k-line query engine. The harness, not the model, was what made it useful.

LangChain on continual learning. Three places a system can learn — model weights, harness code, and context documents. The context layer is the cheapest place to improve, and often the most effective.

The evolution

2023–2024   →  prompt engineering
2025        →  context engineering
2026        →  harness engineering

The job keeps moving up the stack.

What to take away

The model is becoming a cheaper and more interchangeable part. The compounding happens outside of it — in the scaffolding, the memory, the data layer, the tool definitions. That's where our clients see the actual leverage, and that's where we spend the most time.

If you're about to kick off an AI project, ask yourself where the compounding will happen. If the answer is "in the model's head," you're building on the wrong layer.

We build the layer that compounds — data foundations, tool integrations, and the scaffolding that makes AI projects actually stick. 30 minutes, honest read.

LLMs as knowledge compilers

Queries become artifacts

The system improves without touching the weights

Eventually you need a harness

Three recent signals

The evolution

What to take away

More on the same topics.

What Google Analytics actually tells you (and what it doesn't)

Claude Code source code leak: the architecture lesson nobody's talking about

Company readiness for adopting AI

30 minutes. We'll tell you honestlywhat's broken.