Research · · 2 min read

OpenAI's internal model disproves an 80-year-old Erdős conjecture — and it shipped with a Lean-verified proof

On May 20, 2026, OpenAI announced that an internal reasoning model disproved Paul Erdős's 1946 planar unit distance conjecture, with the proof independently verified by Fields Medalist Tim Gowers and Princeton's Will Sawin. The builder takeaway is not 'AI does math' — it's that the proof shipped with Lean formalisation, removing the human-review bottleneck.


On Tuesday, May 20, 2026, OpenAI announced that one of its internal reasoning models had disproved a central conjecture in discrete geometry — Paul Erdős’s planar unit distance problem, first posed in 1946.

The problem sounds deceptively simple: place n points in a plane — how many pairs of those points can sit exactly one unit apart? For nearly 80 years, the working assumption was that variants of the square grid set the upper bound. OpenAI’s model produced an infinite family of constructions that beats the grid by a polynomial factor, drawing on algebraic number theory and infinite class field towers — branches of mathematics most LLMs have never demonstrated working knowledge of, let alone produced novel results in.

Why this one is different from the usual “AI does math” headline

OpenAI’s previous attempt at a related Erdős problem had to be retracted after mathematicians found gaps. This release was structured to avoid a repeat:

  1. External verification before the announcement — the proof was independently checked by Tim Gowers (Fields Medalist, Cambridge) and Will Sawin (Princeton), who confirmed and even tightened the result. Per Scientific American’s reporting, additional review came from Daniel Litt (Toronto), Jacob Tsimerman (Toronto), Thomas Bloom and Melanie Matchett Wood (Harvard).
  2. Lean-formalised proof — the full proof PDF is accompanied by a Lean formalisation. Lean is the proof assistant Terence Tao has been championing; once a proof passes Lean’s type-checker, the “trust” question collapses to “do you trust Lean’s kernel” — which is a far smaller surface than “do you trust an LLM transcript.”
  3. OpenAI’s own mathematicians own the framing — Mehtaab Sawhney and Mark Sellke (both inside OpenAI) authored the human-readable write-up. Sawhney described the interaction with the model as “kind of an amazing experience to have a machine give back something which really resembles how I work”.

What this means if you’re building on LLMs

Three things actually matter for builders:

1. An internal model, not a publicly available one. This wasn’t GPT-5.5 or anything you can hit via API today. OpenAI explicitly says it’s “an internal reasoning model.” If you’re benchmarking model selection for your own product, the public API surface still lags what these labs run internally — possibly by 6-12 months. Plan model selection accordingly: today’s frontier API is tomorrow’s mid-tier.

2. Lean (or any proof checker) is the right primitive for “AI did this, but can I trust it?” The pattern generalises far beyond pure math. If you’re building any kind of AI-assisted system that produces verifiable artifacts — refactored code with property tests, generated SQL with schema validation, contract drafts with rule checking — having a deterministic verifier in the loop turns “trust the model” into “trust the verifier.” This is the same shape as why tsc-checked TypeScript generated by Claude or Codex feels much safer than free-form Python.

3. “Reasoning models” actually doing original reasoning is now empirically established. Whether the same regime extends to fuzzier domains (product strategy, system design) is wide open — but the existence-proof that a reasoning model can produce a publishable mathematical result that human experts hadn’t found is now nailed down. For comparison-shopping the current public-API reasoning models, our Gemini 3.5 Flash vs Claude Haiku 4.5 deep dive covers the trade-offs on real builder workloads.

What’s still unclear

OpenAI did not publicly name or version the internal model, did not disclose how many attempts / how much compute went into producing the result, and did not commit to releasing it via API. The Lean formalisation also does not, by itself, establish how general the model’s capability is — disproving one famous conjecture is not the same as being able to attack arbitrary open problems on demand.

Sources

Source: OpenAI