Models · · 2 min read

NuExtract 3 released — a 4B open-weight VLM that beats 9B Qwen at structured extraction (Apache 2.0)

NuMind shipped NuExtract 3, a 4B vision-language model fine-tuned on Qwen3.5-4B for structured extraction + image-to-Markdown. Apache 2.0, 131K context, beats Qwen3.5-9B on NuMind's internal 600-doc benchmark. The best open-weight document-understanding model that fits on a single consumer GPU.


NuMind released NuExtract 3 today — a 4B open-weight vision-language model purpose-built for document understanding: structured JSON extraction from text or images, plus image-to-Markdown conversion. It’s already trending on r/LocalLLaMA, and the benchmark numbers explain why.

What it actually is

  • Size: 4B parameters, fine-tuned on Qwen3.5-4B base
  • License: Apache 2.0 (commercial use allowed, no strings)
  • Context: 131,072 tokens
  • Inputs: text, images, or text + images (multimodal, multilingual)
  • Modes: reasoning and non-reasoning (toggle with enable_thinking)
  • Two main pipelines:
    • Structured extraction → text/image + JSON template + instructions → JSON output
    • Image-to-Markdown → text/image → clean Markdown

The benchmark that matters

On NuMind’s internal structured-extraction benchmark across ~600 diverse documents (invoices, movie posters, floor plans, contracts):

ModelScoreFailures
NuExtract 3 (4B)0.651 ± 0.01927
gemma-4-E4B-it0.53831
Qwen3.5-9B0.479170
Qwen3.5-4B (base)0.417229
GLM-4.6V-Flash0.435153
Ministral-3-3B0.240344

The headline: NuExtract 3 (4B) beats Qwen3.5-9B by 17 points on this task while being less than half the size. The fine-tune is doing real work — this is what a domain-targeted post-training run looks like when you have ~600 hand-labeled documents from real customers.

Why builders should care

This is the first open-weight model in 2026 that I’d seriously consider deploying against GPT-4o / Claude / Gemini Flash for document extraction workloads:

  • Self-hostable: 4B fits comfortably on a 12-16GB consumer GPU. vLLM deployment script is in the model card.
  • Commercial license: Apache 2.0 means no procurement headaches.
  • Real benchmark, not just synthetic: 600 documents including invoices and floor plans is the kind of long-tail mess that breaks general VLMs.
  • Two-mode inference: reasoning mode for hard docs, non-reasoning for production throughput.

For comparison-shopping the closed-model alternatives that would compete on the same workload, see our Gemini 3.5 Flash vs Claude Haiku 4.5 deep dive — these are the API-tier models NuExtract 3 is positioning itself to displace for self-hosted use cases.

What’s still unclear

NuMind didn’t publish a direct comparison against GPT-4o or Claude on the same 600-doc benchmark, which is the comparison that would actually settle the “use the API or self-host?” question. The image-to-Markdown evaluation was scored via Gemini 3 Flash as judge on 100 complex documents — useful, but not the same as side-by-side human eval against closed-model OCR.

Still, for any builder running document-extraction pipelines today on the OpenAI / Anthropic / Google APIs: spinning up NuExtract 3 on your own hardware for a weekend and benchmarking against your real document mix is now a 1-hour exercise instead of a 1-week project. That’s the actual news.

Sources

Source: Hugging Face — numind/NuExtract3