The honeymoon phase of generative AI is ending. After two years of frantic prototyping and board-level pressure to "do something with AI" that really moves the needle, many leaders are hitting a wall of confusing noise around optimal implementation approaches to building meaningful AI applications.
Anybody at this point can – and largely has – spun up some variant on a chatbot, Elastic Chief Product Officer Ken Exner says drily: “Every HR function, every sales function at this point has a chatbot, their own ‘Clippy,’” Exner says, referring to Microsoft’s 1990s take on digital assistants.
With models and the harnesses around them improving rapidly, the pressure now is to start delivering agentic architectures that are grounded in hyper-specific enterprise context, says Exner. That means ongoing work to refine Retrieval Augmented Generation (RAG)-based workflows, but also a broader rethinking of what the industry calls “context engineering” – a dynamic set of tools and practices that ensure that models are getting the right information in the right format to do the best possible job.
Exner and Michael Ni of Constellation Research sat down with The Stack to discuss why 2026 is going to be the year of context engineering.
The context bottleneck
Exner argues that organisations’ technology teams often get “fixated” on architectural esoterica, or minutiae around model choices. “None of it matters if you don't have the right data to give context to an agent,” he says, adding that having the right data sources is more important than ever.
As companies move toward agentic architectures, where AI agents don't just talk but take actions, the stakes for accuracy skyrocket. A chatbot giving a wrong answer is a nuisance; an automated agent taking actions that are potentially wrong and “destructive” is an absolute corporate liability.
This challenge has birthed the discipline of context engineering. While prompt engineering was about how you talk to the model, context engineering is about optimising how you retrieve the most relevant data to an agent.
“Getting the right data to ground the answers or scope the actions of an agent is critical” says Exner, emphasising that the solution isn't simply giving the AI more data. Despite the trend towards larger context windows, which is the amount of information a model can process at once, throwing the entire library at a large language model (LLM) leads to context drift, or noise.
The trick, Exner says, is quite the opposite. For latency, accuracy, and privacy, “you want to give an LLM the least amount of the most relevant information.”
The New Architecture: From RAG to MCP?
The technical frontier of this shift involves a move away from just retrieval-augmented generation (RAG) towards the use of model context protocol (MCP), giving AI agents access to specific APIs and business logic.
This creates a new challenge: tool selection. When an agent has access to hundreds of different tools and data systems, how does it know which one to pick? Both Exner and Ni agree that this is a return to search, a foundational technology. This also is a foundational enterprise challenge, with the need to break down data silos, standardise data practices, and do some heavy cleaning.
Why search? You must be able to parse meaning, extract intent, and navigate complex ontologies to find the one piece of data that matters in a millisecond, Exner says.
Data siloes have a ripple effect
When data is trapped in silos, the right data often does not reach the LLM, causing inaccurate or incomplete results. LLMs need the right information to complete the task, especially the context surrounding the query. For example, if you ask an LLM to calculate year-end revenue for your sales department, the LLM cannot return an accurate answer without information specific to your company, such as your fiscal year-end date and your defined revenue sources.
Additionally, the LLM also needs to know the audience for the query. To give a sales-focused example, a CMO may want to understand the revenue by marketing channel, while a CFO may want a breakdown of revenue by business unit.
The challenge shouldn’t be oversimplified, Ni and Exner say. Engineers need to run a range of retrieval techniques to most efficiently get information to a model. Parsing and extracting meaning from the data involves connectors, chunking strategies, embedding models, vectorising, and inference services.
Elastic’s CPO says the company has worked hard to optimise results for customers and remove some of the heavy lifting. In his experience, “we find that the best and most relevant results happen when organisations combine techniques, such as combining graph traversal together with geospatial search to come together with vector search,” Exner says, giving one example.
See also: Elastic's CISO on security culture
By combining techniques, like reranking (reordering retrieved documents based on their relevance to the query) and others, you can get much better outcomes. This can, Exner admits, “get complicated fast” as a workflow.
Ni agrees that many of the organisations he talks to have been stung by their early experiences and are rethinking their approach to getting ROI out of generative AI applications.
“All the early adopters who were creating those chatbots had to put together their own encoders, do their reranking,” Ni says. “Now we're going back to all the lessons that folks had to learn in terms of, how do you actually deliver relevance, how do you actually tune these things and all the tools behind that? I think that this is a really interesting time.”
They put so many of the policies into the agents, or the LLMs themselves, but now they're looking at how they scale this and make it work, Ni adds.
Exner says his team and Elastic have valuable lessons to share – and tools to help those looking to deliver more value from their AI applications.
“Our team provides an easy-to-use experience with the best-in-class primitives, ranking models, inference, and encoding models.
“Elastic also provides an end-to-end experience, making it simple and easy to get started, while also making it possible to drop down and configure at the primitive level” he adds – meaning it’s ready to support both at an enterprise level and experienced teams of engineers.
Stepping back to survey how fast this environment is evolving, he sums it up: “In 2025, everyone was talking about agents and agentic architectures; 2026? I can guarantee you it's gonna be the year of context engineering.”
Learn how to keep your agents in context with Elasticsearch.
Delivered in partnership with Elastic.