We spent the last two years obsessing over prompts, the perfect phrasing, the magic incantation that would unlock AI’s potential. We treated it like a puzzle: find the right words, get the right answer. But that was never really the point. The models were always capable of more. We just weren’t giving them what they needed to show it.
Now, as context windows stretch into hundreds of thousands of tokens and AI systems tackle genuinely complex work, a different skill is emerging as foundational: context engineering. Not the ability to ask cleverly, but the ability to architect understanding itself. As Andrej Karpathy puts it, this is “the delicate art and science of filling the context window with just the right information for the next step.”

From Models to Memory
Think of an LLM like a CPU, and its context window like RAM, the model’s working memory. Just as an operating system manages what data gets loaded into RAM, context engineering manages what information fills the LLM’s context window.
When context windows were small, model choice mattered most. But now that models can process 200,000+ tokens, equivalent to multiple books, the bottleneck has shifted. It’s no longer about whether the model can reason well. It’s about whether you’ve given it the right information, structured the right way. Most agent failures arise from context problems, not model limitations. Poor context management creates context poisoning, distraction, confusion, and clash, problems that no amount of model improvement can fix.
Why Attention Is Finite
Despite their speed and ability to process massive amounts of data, LLMs lose focus just like humans do. I read this Research on needle-in-a-haystack benchmarking that has revealed “context rot” – as the number of tokens in the context window increases, the model’s ability to accurately recall information decreases.
This happens across all models, though some degrade more gracefully than others. Context is a finite resource with diminishing returns. Like humans with limited working memory, LLMs have an “attention budget” they draw on when parsing large volumes of information. Every new token depletes this budget, making curation critical.
The reason is architectural. LLMs use transformers, where every token can attend to every other token across the entire context. This creates n² pairwise relationships for n tokens. As context length increases, the model’s ability to capture these relationships gets stretched thin, which is a natural tension between context size and attention focus.
Models also develop their attention patterns from training data where shorter sequences are more common. They have less experience with context-wide dependencies. While techniques like position encoding interpolation help models handle longer sequences, they come with degradation in understanding token positions.
These factors create a performance gradient rather than a hard cliff. Models remain capable at longer contexts but show reduced precision for information retrieval and long-range reasoning. This reality makes thoughtful context engineering essential for building capable agents.
Four Strategies That Matter
Given finite attention budgets, good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of your desired outcome. Four fundamental strategies I have found in : write, select, compress, and isolate.
Write
Storing information outside the context window for future use. This is where sessions and memory come in. Sessions represent short-term memory through “scratchpads”- temporary storage for working notes. Memory represents long-term persistence across conversations, like how ChatGPT remembers your coding style or Cursor learns your project structure.
Select
Pulling the right information into the context window at the right time. An agent might retrieve notes from its scratchpad, fetch relevant memories using similarity search, or use RAG to load only relevant tools. Good selection is precise about what’s relevant versus what’s intrusive.
Compress
Managing token budgets through summarization and trimming. Long conversations quickly exceed limits. Systems like Claude Code use “auto-compact” when you exceed 95% of the context window.
Isolate
Splitting information across multiple agents or environments to keep each context focused and manageable.
Getting the Altitude Right
System prompts deserve special attention. They should be extremely clear and use simple, direct language that presents ideas at the right altitude for the agent.
The right altitude according to Anthropic is the Goldilocks zone between two common failures:
- Too prescriptive: Engineers hardcode complex, brittle logic in prompts to elicit exact behavior. This creates fragility and maintenance nightmares.
- Too vague: Engineers provide high-level guidance that fails to give concrete signals or falsely assumes shared context.
The optimal altitude balances specificity with flexibility: clear enough to guide behavior effectively, yet flexible enough to let the model apply strong heuristics across different situations.
The Infrastructure Reality
Most organizations aren’t ready for this. The knowledge management problem hits first. The effective context engineering requires clean, well-organized information, but most companies have data scattered everywhere.
There’s the memory architecture challenge. Building systems that can write to scratchpads, maintain session state, and retrieve intelligently requires architectural decisions most teams haven’t considered.
There’s computational cost and latency. Larger context windows are expensive. Assembling context can take longer than inference itself, forcing hard tradeoffs between quality and speed.
And there’s the tooling gap. While frameworks like LangGraph and LlamaIndex help, many teams still build custom solutions for problems that should have standardized tools.
Skills You Actually Need
Context engineering requires an unusual blend:
The four strategies – Understanding when to write context versus keep it in the prompt, how to select without overwhelming the model, when compression makes sense.
Memory system design -Understanding sessions versus memory, implementing scratchpads, building effective retrieval with embedding models and vector databases.
Information architecture – Good context engineers think like librarians before they think like programmers.
Data engineering – Context engineering is data engineering with a specific purpose: preparing information for AI reasoning.
Model understanding – Knowing how models process information and what causes them to lose track.
Empirical mindset – Context engineering has few universal rules. Good context engineers constantly experiment and iterate based on evidence.
Why Context Engineering Matters for Building Agents
Agents aren’t single-turn systems. They need to remember what they’ve tried, track their progress, and adjust their approach. Without proper context management, agents lose their way, they repeat failed attempts, forget their objectives mid-task, and contradict themselves across steps.
Good context engineering gives agents coherence. When an agent writes its plan to a scratchpad, it can refer back when the context window gets truncated. When it stores learned patterns in memory, it applies those lessons to future tasks. When it selects only relevant information from a large knowledge base, it reasons more clearly without distraction.
This matters because agents operate in environments where perfect instructions are impossible. Agents need to adapt, and adaptation requires working memory: both short-term (sessions) and long-term (memory).
The finite attention budget makes this critical. An agent processing thousands of tokens loses precision. Context rot sets in. Important details get buried. The agent’s reasoning degrades not because the model is incapable, but because the context is poorly managed.
This is why context engineering has become the primary focus for teams building capable agents. You can have the most advanced model available, but without deliberate context management, writing strategically, selecting carefully, compressing intelligently, isolating effectively, you’re building on unstable ground.
Next
Anthropic points out that the techniques we all have known so far will continue evolving as models improve. We’re already seeing that smarter models require less prescriptive engineering, allowing agents to operate with more autonomy. But even as capabilities scale, treating context as a precious, finite resource will remain central to building reliable, effective agents. The technology is capable. It’s waiting for us to give it what it needs. That’s the work ahead.


Leave a Reply