Microsoft has been building its own AI models, large and small, for many years; in fact, five of the seven ‘new’ in-house frontier models with strong benchmark results announced at its Build conference this week, are updates of models it already had.
But the announcement itself, and the development of its first ‘reasoning’ model to compete with Claude Opus 4.x and GPT-5.5, are an explicit declaration that it’s not replacing what became a fraught dependency on OpenAI with a potentially similar one with Anthropic.
The flagship model of the announcement was Microsoft's MAI-Thinking-1, which Redmond says it "trained it from the ground up on clean data*, without distillation from third-party models."
It says the 35 billion active parameter, mixture-of-experts model supports long context with a "256k token window (enough to fit a 600 page document), function calling, and the flexibility to add developer instructions."
- See the MAI-Thinking-1 technical paper [pdf]
- See the MAI-Thinking-1 model card [pdf]
OpenAI may struggle to replace Microsoft services it depends on, like Cosmos DB for ChatGPT memory and AKS for training its models.
Meanwhile, Microsoft can move its own products and some of its customers to its own AI stack (including both its new MAI and local models), reuse the custom Maia silicon it designed for the OpenAI partnership and get both better economics and better governance.
MAI models have a slightly smaller footprint and promise half to a quarter of the cost of using other frontier models for similar or better results. With the recent switch to charging actual token usage, Microsoft is making MAI-Code-1-Flash the default in GitHub Copilot. It requires up to 60% fewer tokens than the Claude models already available in GitHub Copilot, VS Code and Excel Copilot, Redmond claims; that may be popular.
MAI-Image-2.5 is already used in PowerPoint and OneDrive, MAI-Transcribe-1.5 handles transcription for Copilot, Teams, GitHub, and Dynamics 365 Contact Centre. The MAI models will also be available on Windows, at least on Arm devices with the new NVIDIA/MediaTek N1X SoC; local models are another way to get costs down (for both users and Microsoft).
Your model, your moat
Fine tuning Thinking-1 gives even better results: some Excel Copilot prompts use a new tuned MAI model that matches GPT 5.4 on results but is 10x more cost efficient. As well as tuning MAI models to replace OpenAI in its own products, Microsoft will offer frontier tuning to customers to create their own private frontier models. MAI models tuned for McKinsey and Land O’Lakes were both 10x more cost efficient than GPT-5.5 for their very different workloads.
Like the various Microsoft IQ services that add your enterprise context to agents and models, private RL tuning using data from Microsoft 365, Fabric and a post-deployment feedback loop promises that your data will make your version of the model better – without doing the same for your competitors.
The new models go hand in hand with new tools for governance, policy and finops and Microsoft can also promise enterprises less risk. Thinking-1 is explicitly trained for safety rather than having guardrails bolted on afterwards, and was trained from scratch on “clean and appropriately licenced data”; no distillation from other models, no synthetic or AI-generated content, no exposure to billion dollar Anthropic-style settlements.
It’s not only pitching Azure and Foundry customers either; the MAI models are available on Fireworks, Baseten and OpenRouter as well.
Supercal-AI-frag-AI-listicexp-AI-lidocious
Microsoft calls both the process it used to build the MAI models in its Superintelligence Lab and the frontier tuning service “hill climbing”.
This technical term for the incremental optimisations that create and tune models is also a less worrying phrase than the recursive self-improvement Anthropic and OpenAI are pitching as the current route to AGI.
CEO Satya Nadella ended his presentation with a plea to businesses to remember AI is there to serve humans not replace them, and avoid scenarios where “technology concentrates power, reduces human agency and leaves the society to absorb the consequences”; a contrast in tone to the bombast of other frontier labs.
*Per the model card, MAI-THINKING-1 was trained on "30T tokens extracted from a mixture of publicly available and licensed human-generated data covering covering web data, public GitHub code, books, academic papers, news, multilingual text, and domain-specific materials. Each of these sources are processed in-house from start to finish. We choose to not use any synthetic data generated by language models during pre-training and make an effort to avoid and remove AI-generated content within collected data sources. For pre-training, we do not use any open source training datasets and decontaminate common machine learning databases from our training data