MAI-Thinking-1: Microsoft's First In-House Reasoning AI Has Arrived
Microsoft just shipped its own frontier reasoning model — trained from scratch, no OpenAI distillation, and it's gunning for Claude.

For two years, Microsoft funded, deployed, and bet its future on OpenAI's models. At Build 2026 in San Francisco on June 2, that chapter officially started to close. Microsoft CEO Satya Nadella unveiled MAI-Thinking-1 — the company's first in-house frontier reasoning model — built entirely from scratch, trained on commercially licensed data, and distilled from nobody else's weights.
This is not a fine-tune. It is not a wrapper. It is Microsoft's own model, in Microsoft's own class.
What you need to know
- MAI-Thinking-1 is a sparse Mixture-of-Experts model: 35B active parameters, ~1T total, 256K context window.
- Trained entirely from scratch on commercially licensed data — zero distillation from OpenAI or any other third-party model.
- Posts 97.0% on AIME 2025 and 94.5% on AIME 2026 for mathematical reasoning.
- Matches Claude Opus 4.6 on SWE-Bench Pro (52.8%) and beat Claude Sonnet 4.6 in blind human evaluations by Surge.
- Currently in private preview on Microsoft Foundry; broader distribution via OpenRouter, Fireworks, and Baseten is planned.
- Announced alongside six other new MAI models: coding, image, voice, transcription and more.
Why "trained from scratch" matters
In a world full of models that quietly borrow from each other through distillation, Microsoft made a pointed choice to build something clean. The team's technical report — titled "MAI-Thinking-1: Building a Hill-Climbing Machine" — describes a training loop designed not just to produce one model, but to produce an ever-improving system. The model is the output; the machine is the investment.
That framing is significant. It means Microsoft isn't positioning MAI-Thinking-1 as a one-off capability play. They're positioning it as the first product of a repeatable engine — one that can climb benchmarks on its own data, its own rewards, and its own evaluation process. No borrowed intelligence. No third-party ceiling.
The "no distillation" stance also has enterprise implications. Microsoft's datasets are described as commercially licensed, traceable, and enterprise-grade — exactly the provenance story that regulated industries and large organizations want to hear before deploying AI in production.
What is distillation in AI training?
Distillation is a technique where a smaller "student" model learns by mimicking the outputs of a larger "teacher" model. Many AI labs use outputs from frontier models like GPT-4 or Claude to bootstrap their own training data. Microsoft says MAI-Thinking-1 used none of this — every training signal came from Microsoft's own data and reward models.

The architecture: MoE at a trillion parameters
MAI-Thinking-1 uses a sparse Mixture-of-Experts (MoE) architecture. The total parameter count sits around one trillion, but at inference time, only 35 billion of those parameters are active on any given request. The rest stay dormant. This is the same broad design philosophy behind models like Mixtral and GPT-4 — activate what you need, leave the rest off — but at frontier scale.
The practical upshot: Microsoft gets frontier-level reasoning capability at a fraction of the compute cost of a dense trillion-parameter model. Kyle Daigle, Microsoft's developer marketing chief, described it as "high efficiency and performance at a low-token cost." For an enterprise platform serving Foundry customers, that economics argument matters as much as the benchmark numbers.
The 256,000-token context window places it comfortably in the range needed for production-grade workflows — long documents, extended codebases, multi-turn agentic tasks.
Active Parameters
35B
MOE architecture
AIME 2025
97.0%
Mathematical reasoning
AIME 2026
94.5%
Scientific reasoning
SWE-Bench Pro
52.8%
Matches Opus 4.6
Benchmark breakdown: math, code, and humans
Microsoft led with three categories of evidence: formal math benchmarks, coding benchmarks, and human preference evaluations. All three tell a consistent story about a model that belongs in the same conversation as Anthropic's and Google's flagship reasoning systems.
- AIME 202597
- AIME 202694
The AIME (American Invitational Mathematics Examination) scores are the clearest signal of systematic reasoning capability. A 97.0% on AIME 2025 and 94.5% on AIME 2026 means the model is working through sustained multi-step problems correctly — not pattern-matching to memorized solutions. Microsoft describes these results as validation that their training loop can produce genuine reasoning gains from the ground up.
On software engineering, the model posts 52.8% on SWE-Bench Pro, which places it alongside Claude Opus 4.6 on the hardest publicly available coding benchmark. For context, SWE-Bench Pro involves resolving real GitHub issues in popular open-source repositories — it's not a quiz, it's a simulation of actual software engineering work.
The human evaluation result is the one Microsoft emphasized most in its keynote. Surge — an independent human rating firm — ran 1,276 tasks across single-turn and multi-turn conversations, asking professional raters which model response better advanced the user's goals. MAI-Thinking-1 was preferred over Claude Sonnet 4.6. Not preferred on math. Preferred on helpfulness, instruction-following, appropriate detail, and clarity. That's a harder category to game than a benchmark.
In blind evaluations across 1,276 real tasks, human raters preferred MAI-Thinking-1 over Claude Sonnet 4.6 — on helpfulness, not just math scores.
What Microsoft announced alongside MAI-Thinking-1
MAI-Thinking-1 didn't arrive alone. Microsoft announced a total of seven new in-house MAI models at Build 2026, spanning the full stack of modalities its products need.
August 2025
MAI-Voice debuts
Microsoft's voice model launches in Copilot Daily and Podcasts — generating a full minute of audio in under a second on a single GPU, available in 15 languages.
April 2026
MAI-Voice reaches commercial availability
MAI-Voice lands in Microsoft Foundry for production deployments. Microsoft describes it as having fine-grain emotional control and natural prosody.
June 2, 2026
MAI-Thinking-1 unveiled at Build 2026
Microsoft's first in-house frontier reasoning model launches in private preview on Microsoft Foundry. 35B active parameters, MoE architecture, 256K context window.
June 2, 2026
MAI-Code-1-Flash ships to all GitHub Copilot users
A 5B-parameter coding model rolls out to every paying GitHub Copilot subscriber on launch day. It outperforms Claude Haiku 4.5 across all four core coding benchmarks Microsoft tested, with a +16-point lead on SWE-Bench Pro.
Coming soon
Broader MAI distribution
MAI-Thinking-1 expands beyond Foundry to OpenRouter, Fireworks, and Baseten. Voice to Flash — for ultra-latency-sensitive voice agents — also signalled for 2026 rollout.
The strategic picture: Microsoft is building off OpenAI's bill
Since Copilot launched in 2021, GitHub Copilot has run primarily on OpenAI's models. Claude and other options were added to its model picker over the past year — but the core dependency never went away. MAI-Thinking-1 and MAI-Code-1-Flash change that. Microsoft now has its own frontier reasoning model, its own production-grade coding model, and a credible path to serving its own products and Foundry customers without passing every inference dollar to OpenAI.
This release closes the loop on a strategic pivot Microsoft has been signalling since renegotiating its OpenAI partnership in late 2025. The company that was once AI's most prominent reseller is now a lab with its own models, its own training infrastructure, and its own benchmark credibility.
The "Humanist Superintelligence" framing Microsoft used in its announcements is also worth noting — advanced AI designed to serve people and organizations, not to replace them. Whether or not that framing holds, it signals how Microsoft wants to position the MAI family in the enterprise: trustworthy, traceable, and purpose-built for work.
Private preview, not GA
MAI-Thinking-1 is currently available in private preview on Microsoft Foundry only. Broad public access via OpenRouter, Fireworks, and Baseten has been announced but no date has been confirmed. If you want access today, you'll need to apply through Foundry.
What this means for builders
If you're building on the Anthropic or OpenAI APIs today, MAI-Thinking-1 probably isn't a reason to switch immediately — it's in private preview and the SDK surface isn't public yet. But the signal matters: the reasoning model market just became more competitive, which historically means better models, better pricing, and more pressure on incumbents to ship.
For teams already in the Microsoft stack — Azure, GitHub Copilot, Microsoft Foundry — the path to testing MAI-Thinking-1 will be shorter. MAI-Code-1-Flash is already shipping to every paying GitHub Copilot subscriber, which is a meaningful footprint for day-one distribution.
The broader point is about architecture decisions. MoE at 35B active parameters suggests Microsoft is betting that sparse activation at scale is the right efficiency tradeoff for reasoning-heavy workloads. If that bet proves out in production, it will influence how every lab approaches their next generation of models.
The three things to watch
- When MAI-Thinking-1 moves from private preview to general availability — and what the pricing looks like against Claude Opus and GPT-4o.
- Whether MAI-Code-1-Flash's GitHub Copilot rollout shifts developer preferences away from OpenAI's Codex-based models.
- Microsoft's next model release — if this is truly a "hill-climbing machine," the next iteration is the real test of whether the training loop compounds.
It's achieved 97 percent on AIME 2025 — the key measure of general-purpose reasoning — but most importantly, it's now at 53 percent on SWE-Bench Pro, which places it right alongside Opus 4.6 on the toughest coding benchmark out there.
MAI-Thinking-1 is the most concrete signal yet that the AI landscape is entering a new phase — one where the biggest platforms don't just deploy frontier models, they build them. That's a different kind of competitive pressure for Anthropic, OpenAI, and Google than they've faced before. Microsoft has distribution, enterprise relationships, and now benchmark credibility. The next twelve months will tell us whether it has compounding intelligence.
Try it today
MAI-Thinking-1 is available in private preview via Microsoft Foundry. MAI-Code-1-Flash is already live for all paying GitHub Copilot subscribers — no waitlist required.
Stay ahead with ProdBlie
Get started

