NuExtract 3 released — a 4B open-weight VLM that beats 9B Qwen at structured extraction (Apache 2.0)

NuMind shipped NuExtract 3, a 4B vision-language model fine-tuned on Qwen3.5-4B for structured extraction + image-to-Markdown. Apache 2.0, 131K context, beats Qwen3.5-9B on NuMind's internal 600-doc benchmark. The best open-weight document-understanding model that fits on a single consumer GPU.

NuMind released NuExtract 3 today — a 4B open-weight vision-language model purpose-built for document understanding: structured JSON extraction from text or images, plus image-to-Markdown conversion. It’s already trending on r/LocalLLaMA, and the benchmark numbers explain why.

What it actually is

Size: 4B parameters, fine-tuned on Qwen3.5-4B base
License: Apache 2.0 (commercial use allowed, no strings)
Context: 131,072 tokens
Inputs: text, images, or text + images (multimodal, multilingual)
Modes: reasoning and non-reasoning (toggle with enable_thinking)
Two main pipelines:
- Structured extraction → text/image + JSON template + instructions → JSON output
- Image-to-Markdown → text/image → clean Markdown

The benchmark that matters

On NuMind’s internal structured-extraction benchmark across ~600 diverse documents (invoices, movie posters, floor plans, contracts):

Model	Score	Failures
NuExtract 3 (4B)	0.651 ± 0.019	27
gemma-4-E4B-it	0.538	31
Qwen3.5-9B	0.479	170
Qwen3.5-4B (base)	0.417	229
GLM-4.6V-Flash	0.435	153
Ministral-3-3B	0.240	344

The headline: NuExtract 3 (4B) beats Qwen3.5-9B by 17 points on this task while being less than half the size. The fine-tune is doing real work — this is what a domain-targeted post-training run looks like when you have ~600 hand-labeled documents from real customers.

Why builders should care

This is the first open-weight model in 2026 that I’d seriously consider deploying against GPT-4o / Claude / Gemini Flash for document extraction workloads:

Self-hostable: 4B fits comfortably on a 12-16GB consumer GPU. vLLM deployment script is in the model card — and our step-by-step guide to running NuExtract 3 locally walks through the vLLM and Transformers setup, the JSON template language, and PDF/image extraction.
Commercial license: Apache 2.0 means no procurement headaches.
Real benchmark, not just synthetic: 600 documents including invoices and floor plans is the kind of long-tail mess that breaks general VLMs.
Two-mode inference: reasoning mode for hard docs, non-reasoning for production throughput.

For comparison-shopping the closed-model alternatives that would compete on the same workload, see our Gemini 3.5 Flash vs Claude Haiku 4.5 deep dive — these are the API-tier models NuExtract 3 is positioning itself to displace for self-hosted use cases.

What’s still unclear

NuMind didn’t publish a direct comparison against GPT-4o or Claude on the same 600-doc benchmark, which is the comparison that would actually settle the “use the API or self-host?” question. The image-to-Markdown evaluation was scored via Gemini 3 Flash as judge on 100 complex documents — useful, but not the same as side-by-side human eval against closed-model OCR.

Still, for any builder running document-extraction pipelines today on the OpenAI / Anthropic / Google APIs: spinning up NuExtract 3 on your own hardware for a weekend and benchmarking against your real document mix is now a 1-hour exercise instead of a 1-week project. That’s the actual news.

Sources

NuExtract 3 on Hugging Face — model card, benchmarks, deployment configs
NuMind GitHub — numindai/nuextract
NuExtract platform homepage
r/LocalLLaMA discussion thread