Why context engineering is key for agents

The honeymoon phase of generative AI is ending. After two years of frantic prototyping and board-level pressure to "do something with AI" that really moves the needle, many leaders are hitting a wall of confusing noise around optimal implementation approaches to building meaningful AI applications.

Anybody at this point can – and largely has – spun up some variant on a chatbot, Elastic Chief Product Officer Ken Exner says drily: “Every HR function, every sales function at this point has a chatbot, their own ‘Clippy,’” Exner says, referring to Microsoft’s 1990s take on digital assistants.

With models and the harnesses around them improving rapidly, the pressure now is to start delivering agentic architectures that are grounded in hyper-specific enterprise context, says Exner. That means ongoing work to refine Retrieval Augmented Generation (RAG)-based workflows, but also a broader rethinking of what the industry calls “context engineering” – a dynamic set of tools and practices that ensure that models are getting the right information in the right format to do the best possible job.

Exner and Michael Ni of Constellation Research sat down with The Stack to discuss why 2026 is going to be the year of context engineering.

The context bottleneck

Exner argues that organisations’ technology teams often get “fixated” on architectural esoterica, or minutiae around model choices. “None of it matters if you don't have the right data to give context to an agent,” he says, adding that having the right data sources is more important than ever.

As companies move toward agentic architectures, where AI agents don't just talk but take actions, the stakes for accuracy skyrocket. A chatbot giving a wrong answer is a nuisance; an automated agent taking actions that are potentially wrong and “destructive” is an absolute corporate liability.

This challenge has birthed the discipline of context engineering. While prompt engineering was about how you talk to the model, context engineering is about optimising how you retrieve the most relevant data to an agent.

“Getting the right data to ground the answers or scope the actions of an agent is critical” says Exner, emphasising that the solution isn't simply giving the AI more data. Despite the trend towards larger context windows, which is the amount of information a model can process at once, throwing the entire library at a large language model (LLM) leads to context drift, or noise.

The trick, Exner says, is quite the opposite. For latency, accuracy, and privacy, “you want to give an LLM the least amount of the most relevant information.”

The New Architecture: From RAG to MCP?

The technical frontier of this shift involves a move away from just retrieval-augmented generation (RAG) towards the use of model context protocol (MCP), giving AI agents access to specific APIs and business logic.

This creates a new challenge: tool selection. When an agent has access to hundreds of different tools and data systems, how does it know which one to pick? Both Exner and Ni agree that this is a return to search, a foundational technology. This also is a foundational enterprise challenge, with the need to break down data silos, standardise data practices, and do some heavy cleaning.

Why search? You must be able to parse meaning, extract intent, and navigate complex ontologies to find the one piece of data that matters in a millisecond, Exner says.

Data siloes have a ripple effect

When data is trapped in silos, the right data often does not reach the LLM, causing inaccurate or incomplete results. LLMs need the right information to complete the task, especially the context surrounding the query. For example, if you ask an LLM to calculate year-end revenue for your sales department, the LLM cannot return an accurate answer without information specific to your company, such as your fiscal year-end date and your defined revenue sources.

Additionally, the LLM also needs to know the audience for the query. To give a sales-focused example, a CMO may want to understand the revenue by marketing channel, while a CFO may want a breakdown of revenue by business unit.

The challenge shouldn’t be oversimplified, Ni and Exner say. Engineers need to run a range of retrieval techniques to most efficiently get information to a model. Parsing and extracting meaning from the data involves connectors, chunking strategies, embedding models, vectorising, and inference services.

Elastic’s CPO says the company has worked hard to optimise results for customers and remove some of the heavy lifting. In his experience, “we find that the best and most relevant results happen when organisations combine techniques, such as combining graph traversal together with geospatial search to come together with vector search,” Exner says, giving one example.

AI battle scars and ROI: Why context engineering is key for agents

The context bottleneck

The New Architecture: From RAG to MCP?

Data siloes have a ripple effect

See also: Elastic's CISO on security culture