Jul 4, 2026

How to Run Leanstral 1.5 for Lean 4 Proofs (Local vLLM and the Free API Shortcut)

Read time: ~7 minutes. What you’ll learn: what Leanstral 1.5 is and why a Lean 4 proof model is worth your attention, what it can actually do (the benchmark numbers and the real bugs it found), and the two real ways to run it — the free Mistral API via the Vibe CLI (the easy path) and a self-hosted vLLM server (4-GPU, for on-prem) — plus the Lean 4 setup, the honest hardware picture, and the sampling settings Mistral recommends.

Sourcing note: the model spec, vLLM command, sampling settings, and Lean 4 / Vibe integration steps are quoted from the official mistralai/Leanstral-1.5-119B-A6B model card and Mistral’s Leanstral 1.5 announcement. Benchmark numbers are Mistral’s own. No GGUF/Ollama build is published at the time of writing, so this guide doesn’t invent one. Links at the bottom.

Most “AI for code” models help you write code faster. Leanstral 1.5 is a different animal: it helps you prove code (and math) correct. It’s Mistral’s model for Lean 4 — the proof assistant used across formal mathematics and high-assurance software — and version 1.5 (released June 30, 2026) is strong enough to saturate the standard proving benchmark and turn up real bugs in real repositories. If you work in formal methods, verified software, or competition math, this is worth running. Here’s how.

1. What Leanstral 1.5 actually is

Leanstral 1.5 is an open-weight model tuned specifically for Lean 4 automated theorem proving and autoformalization (turning natural-language math into formal Lean statements). From the official model card:

119B total parameters, 6.5B active — a Mixture-of-Experts model with 128 experts, 4 active per token. You get the knowledge capacity of a 119B model at roughly the inference cost of a ~6.5B one.
256K context window (Mistral recommends staying ≤200K in practice).
License: Apache 2.0 — open weights, commercially usable.
It operates as an agent: it doesn’t just emit a proof, it drives a loop — proposing Lean tactics, getting feedback from the Lean compiler, and iterating until the proof checks.

That last point is the key mental model. Leanstral isn’t a chatbot that guesses proofs; it’s designed to run against a real Lean 4 toolchain, using the compiler as ground truth. That’s what makes its output trustworthy — a proof either type-checks in Lean or it doesn’t.

2. What it can actually do

Mistral’s reported results (their own numbers, pending independent reproduction):

Benchmark	Leanstral 1.5
miniF2F (val + test)	100% (saturated)
PutnamBench	587 / 672 solved
FATE-H	87% (state-of-the-art)
FATE-X	34% (state-of-the-art)
FLTEval	Pass@1 28.9, Pass@8 43.2

Two things stand out. First, it saturates miniF2F — the long-standing formal-proof benchmark is essentially solved at this scale. Second, it shows strong test-time scaling: on PutnamBench it solves 493 problems at a 1M-token budget and 587 at 4M tokens, so throwing more inference at a hard problem genuinely helps. Mistral also reports it edges out Seed-Prover 1.5 on PutnamBench at roughly $4 per problem versus an estimated $300+ — a large cost gap if it holds up.

And it’s not just benchmarks: Mistral says Leanstral 1.5 uncovered 5 previously unknown bugs across 57 repositories it was tested on, including a buffer overflow in the datrs/varinteger library. Formal verification finding real memory bugs is the practical payoff.

3. Prerequisite: a working Lean 4 toolchain

Whichever way you run the model, Leanstral is only useful with Lean 4 installed, because the agent checks its proofs against the real compiler. If you don’t already have it:

Install elan (the Lean toolchain manager) and let it pull a recent Lean 4 + Mathlib if you’re doing math proofs.
Confirm lake (Lean’s build tool) works in your project before pointing the agent at it.

Without a Lean toolchain, the model can still emit proof text, but you lose the whole point — compiler-verified correctness.

4. Method 1 — Mistral Vibe + the free API (easiest)

For almost everyone, this is the right starting point. Leanstral 1.5 is available as a free API endpoint (leanstral-1-5, listed at $0 on Mistral’s Labs tier), and Mistral’s Vibe CLI wraps it into a ready-made Lean agent. No GPUs, no weights download.

# Install the Vibe CLI
uv tool install mistral-vibe

Then, per the model card:

Enable Labs models in your Mistral account settings (Leanstral 1.5 lives on the Labs tier).
Run the /leanstall setup command inside Vibe to wire up the Lean agent.
Launch the agent:

vibe --agent lean

That drops you into the Lean-aware agent backed by the hosted leanstral-1-5 model. Point it at your Lean project and let it work proofs against your local lake build. Because the model is free on the Labs tier, this is also the cheapest way to evaluate whether Leanstral fits your workflow before committing any hardware.

5. Method 2 — Self-hosted vLLM (on-prem, 4-GPU)

If you need the weights on your own hardware — air-gapped proof work, data control, or heavy batch proving — serve the official weights with vLLM. This is the command straight from the model card (note it defaults to 4-way tensor parallelism — this is a 119B model, so plan for a multi-GPU box):

vllm serve mistralai/Leanstral-1.5-119B-A6B \
  --max-model-len 200000 \
  --tensor-parallel-size 4 \
  --attention-backend FLASH_ATTN_MLA \
  --tool-call-parser mistral \
  --enable-auto-tool-choice \
  --reasoning-parser mistral

Requirements and notes:

vLLM ≥ 0.24.0.
--tensor-parallel-size 4 — the default targets 4 GPUs. The weights are 119B total; even though only 6.5B activate per token (so inference is fast once loaded), you still have to hold the full parameter set in GPU memory. This is workstation/server territory, not a laptop.
--tool-call-parser mistral + --enable-auto-tool-choice + --reasoning-parser mistral are what let the agent call the Lean tools and separate its reasoning — don’t drop them.
--max-model-len 200000 matches Mistral’s ≤200K recommendation; raise/lower to fit your memory.

Then point Vibe’s Lean agent at your local server instead of the hosted API by editing the agent config at ~/.vibe/agents/lean.toml to use your vLLM endpoint URL. Now the whole loop — model + Lean compiler — runs on your hardware.

6. The hardware reality

Be clear-eyed: 119B total parameters means the local route needs real GPUs (the default is 4-way tensor parallel). The MoE design helps inference speed — only 6.5B parameters fire per token — but it does not shrink the memory you need to load the model. So:

Have a 4-GPU server (or comparable) and need on-prem? Method 2 is for you.
Everyone else — researchers, students, anyone evaluating — should use Method 1. The API is free on the Labs tier, so there’s little reason to buy hardware just to try it. Validate the workflow first; self-host only if data control or volume demands it.

There’s no official GGUF/Ollama build at the time of writing, so single-GPU-quantized local runs aren’t a documented path yet — if community quants appear later, verify them before trusting a proof pipeline to them.

7. Sampling and usage notes

From the model card:

Temperature: 1.0.
Reasoning effort: 'high' for hard proofs (more internal search, more tokens — which, given the test-time-scaling result in §2, is exactly where the extra problems get solved), or 'none' for quick, cheap attempts.
Keep context ≤ 200K even though 256K is supported.

Practical tip: for a genuinely hard theorem, lean into effort 'high' and a large token budget — the PutnamBench 493→587 jump from 1M to 4M tokens shows the model keeps finding proofs when you give it room to search.

8. Who this is actually for

Leanstral 1.5 is a specialist, and that’s the point:

Formal-methods and verified-software teams — proving properties of critical code, or the bug-finding use case (memory safety, correctness) that surfaced real issues in 57 repos.
Mathematicians and competition-math folks using Lean 4 / Mathlib — autoformalization and automated proving.
Researchers in automated reasoning who want an open-weight, Apache-2.0 baseline that’s state-of-the-art on FATE-H/FATE-X.

If you’re writing everyday application code, a general coding model is the better tool — see our Kimi K2.7 Code guide or Ornith-1.0 locally. Leanstral earns its place precisely when “probably correct” isn’t good enough and you need a machine-checked proof.

The takeaway

Leanstral 1.5 brings state-of-the-art Lean 4 theorem proving to open weights — 119B/6.5B-active MoE, 256K context, Apache 2.0, saturating miniF2F and solving 587/672 PutnamBench problems. Running it is a two-path decision: start with Method 1 — the free leanstral-1-5 API via vibe --agent lean, no hardware — to see if it fits your proof work, and move to Method 2 — self-hosted vLLM on a 4-GPU box — only when you need the weights on-prem. Either way, pair it with a real Lean 4 toolchain so every proof it produces is compiler-verified, not just plausible.

For general-purpose local coding models, see Kimi K2.7 Code locally, Ornith-1.0 locally, and Qwen 3.6 for local coding.

Sources

mistralai/Leanstral-1.5-119B-A6B — Hugging Face model card — 119B total / 6.5B active (128 experts, 4/token), 256K context (≤200K recommended), Apache 2.0, the vLLM serve command (vLLM ≥0.24.0, tensor-parallel 4, mistral tool/reasoning parsers), temperature 1.0 + reasoning effort high/none, Vibe /leanstall + vibe --agent lean, ~/.vibe/agents/lean.toml for local server
Leanstral 1.5: Proof Abundance for All — Mistral AI — miniF2F 100%, PutnamBench 587/672 (493@1M → 587@4M tokens), FATE-H 87%, FATE-X 34%, FLTEval Pass@1 28.9 / Pass@8 43.2, ~$4/problem vs Seed-Prover ~$300, 5 unknown bugs across 57 repos (incl. datrs/varinteger buffer overflow), free Labs API
Leanstral 1.5 — Mistral Docs model card — API model name leanstral-1-5 / labs-leanstral-1-5, $0 Labs tier, released June 30, 2026
Benchmark numbers are Mistral’s own reported results, pending independent reproduction. No official GGUF/quantized build exists at the time of writing. Verified July 4, 2026 — confirm current details on the official model card before relying on them.