Context engineering is the discipline of designing the information environment that surrounds an AI agent during task execution. Unlike prompt engineering, which focuses on the instruction itself, context engineering considers the full landscape: what documents are available, what tools are accessible, what conversation history is preserved, and how these elements are structured and prioritized.
The distinction matters because modern AI agents operate with context windows that function as working memory. Every token placed in that window has a cost — both computational and attentional. Filling a context window with irrelevant information is like cluttering a desk before trying to solve a difficult problem.
The Context Window as Architecture
Think of the context window not as a text buffer, but as an architectural space. Every token placed in it has a cost and a purpose. The best agent builders treat context like real estate — they zone it carefully, placing the most important information where the model will attend to it most effectively.
A well-designed context budget might look like this:
// Context budget allocation
const budget = {
system: 0.15, // Instructions and persona
tools: 0.10, // Available tool definitions
docs: 0.50, // Retrieved documents and knowledge
conversation: 0.25 // Recent interaction history
};
The most common mistake is filling the context window with raw documents. Instead, distill. Use progressive summarization to reduce large documents to their essential points, then include the full text only when the agent signals it needs more detail.
Retrieval as Context Curation
Retrieval-Augmented Generation (RAG) is often described as a search problem, but it's fundamentally a context curation problem. The quality of an agent's response depends not on finding the most relevant document, but on assembling the right combination of context pieces.
Good retrieval for context engineering means:
- Precision over recall — three highly relevant paragraphs beat ten tangentially related documents
- Recency weighting — newer information should be privileged when topics evolve quickly
- Source diversity — multiple perspectives prevent the agent from anchoring on a single source
- Hierarchical loading — summaries first, details on demand
The goal is not to find information, but to construct a context window that enables the agent to reason effectively about the user's question.