Microsoft's MAI-Code-1-Flash is now a default coding model in VS Code

Microsoft shipped MAI-Code-1-Flash on June 2, 2026 — its first in-house coding model, trained directly with GitHub Copilot's production harnesses. It scores 51.2% on SWE-Bench Pro vs Claude Haiku 4.5's 35.2%, costs less under Copilot billing, and is rolling out as a default model inside VS Code. Here's what changes if you build in Copilot.

Microsoft introduced MAI-Code-1-Flash on June 2, 2026 at Build 2026 — its first coding model built end-to-end in-house, and one that now shows up as a default option inside Visual Studio Code’s Copilot. (Source: Microsoft AI, 2026-06-02) For builders living in Copilot, this is the first time the model picker defaults to something Microsoft trained itself rather than a partner’s.

Key facts:

It scores 51.2% on SWE-Bench Pro. Claude Haiku 4.5 scores 35.2% on the same test — a 16-point gap. (Source: Microsoft AI, 2026-06-02)
It is small. Microsoft described it at Build as roughly a 5B-parameter model, in the size class of Haiku.
It uses up to 60% fewer tokens to solve hard problems on SWE-Bench Verified.
It was trained directly with the GitHub Copilot harnesses used in production — so it learned the exact agentic loop it runs in.
It is priced cheaper than Claude Haiku 4.5 under GitHub Copilot’s token-based billing.
It is rolling out now to GitHub Copilot individual users in VS Code, via the model picker and auto picker, with no extra setup.

Benchmark table comparing MAI-Code-1-Flash against Claude Haiku 4.5 across SWE-Bench Pro, SWE-Bench Verified, SWE-Bench Multilingual, and Terminal Bench 2 — Microsoft's own benchmark table: MAI-Code-1-Flash vs Claude Haiku 4.5 across the SWE-Bench family and Terminal Bench 2. (Source: Microsoft AI)

What actually shipped

MAI-Code-1-Flash is the coding half of a two-model launch. Its sibling, MAI-Thinking-1, is a 35B-active-parameter MoE reasoning model with a 256K context window that hits 97% on AIME 2025 and 52.8% on SWE-Bench Pro. (Source: Microsoft AI, 2026-06-02) But Flash is the one most builders will touch first, because it lands inside the editor by default.

The model also clears Haiku 4.5 on instruction-following: a +28.9 advantage on IF Bench and +14.5 on Advanced IF, plus 85.8% adjusted accuracy on adversarial reasoning. Beyond VS Code, it is available as an API endpoint through Azure AI Foundry and GitHub Models, and on OpenRouter, Fireworks, and Baseten.

Bar chart showing MAI-Code-1-Flash leading Claude Haiku 4.5 on IF Bench, Advanced IF, and Robust IF instruction-following benchmarks — Instruction-following benchmarks — where a small coding model often falls apart, Flash holds a clear lead. (Source: Microsoft AI)

What this means if you build in Copilot

The leverage here is the training method, not the size. Most small coding models are trained on generic code and then bolted into an agent loop they never saw. Flash was trained inside the Copilot harness, so the tool calls, retries, and file edits it makes are the ones the harness actually expects. For routine, high-volume work — boilerplate, refactors, test scaffolding — a cheaper model that’s native to your loop can beat a pricier general model that’s merely good at code.

That matters most under Copilot’s new metered pricing. Since the shift to usage-based Copilot billing, every premium request draws down credits — so a model that’s both cheaper and uses fewer tokens per task changes your monthly bill directly. If you’re deciding which agent to trust with unattended runs, the trade-offs in our Claude Code daily-driver guide and the Claude Code vs Codex head-to-head apply here too: route cheap models like Flash to the bulk work, and reserve a frontier model for the hard 10%.

The catch

Two caveats. First, the 5B size is from the Build keynote, not the official model card — Microsoft’s launch page doesn’t disclose the parameter count, so treat it as guidance, not gospel. Second, it’s rolling out gradually (Microsoft cited roughly 10% of individual users initially), so it may not appear in your picker yet. And a 51% SWE-Bench Pro score is strong for its weight class — it is not a frontier model. Use it as your default for routine work, not as a one-model replacement for Sonnet or GPT-5.5 on genuinely hard tasks.

Sources

Introducing MAI-Code-1-Flash — Microsoft AI, 2026-06-02
Microsoft Build 2026: MAI keynote transcript — Microsoft AI, 2026-06-02
Microsoft unveils new AI models to lessen reliance on OpenAI — CNBC, 2026-06-02