PrismML's Bonsai Image 4B is the first sub-1 GB diffusion model to run on iPhone — at 1-bit and ternary precision, Apache 2.0

PrismML released Bonsai Image 4B on May 26 — a 4B-parameter text-to-image diffusion transformer compressed 8.3× via 1-bit quantization (0.93 GB) or 6.4× via ternary (1.21 GB). It generates a 512×512 image in 6 s on an M4 Pro Mac and 9.4 s on an iPhone 17 Pro Max, while keeping ~95% of full-precision quality. Apache 2.0, weights and code on Hugging Face.

PrismML — the lab behind the Ternary Bonsai 1.58-bit text models from April — shipped Bonsai Image 4B yesterday. It’s a text-to-image diffusion transformer in the 4B parameter class that runs at either 1-bit (binary) or 1.58-bit (ternary, weights ∈ {-1, 0, +1}) precision. The pitch: full-precision image-gen quality, but small enough to live entirely on a phone.

The numbers

Per PrismML’s release post and the PR Newswire announcement:

Variant	Size	Compression vs FP16	Quality retention
Binary (1-bit) Bonsai Image 4B	0.93 GB	8.3×	up to 95%
Ternary (1.58-bit) Bonsai Image 4B	1.21 GB	6.4×	up to 95%

Throughput on real hardware:

Apple M4 Pro Mac: ~6 seconds per 512×512 image
iPhone 17 Pro Max: ~9.4 seconds per 512×512 image
CUDA GPUs: also supported (PrismML doesn’t publish exact numbers yet)

License is Apache 2.0 — both weights and inference code. Weights are on the Hugging Face Bonsai Image collection, with MLX 2-bit / 1-bit, GemLite 2-bit / 1-bit, and unpacked variants.

Want to actually run it? Our step-by-step guide to running Bonsai Image 4B locally walks through the setup script, which quant to download, the one-shot generate command, and the local Studio (web UI + API) — on Mac and NVIDIA PC.

Side-by-side image generations: Bonsai 1-bit vs ternary vs full precision — Comparison grid from PrismML's release page — same prompts rendered at 1-bit, ternary, and full precision. The visual gap is small enough that the ~95% quality-retention claim is at least plausible at thumbnail scale (source: prismml.com).

“Local image generation is the next major milestone for creative AI. With Bonsai Image 4B, we are showing that high-quality diffusion models can run closer to users without giving up the capabilities that make modern image generation useful.” — Babak Hassibi, PrismML CEO and co-founder, in the PR Newswire announcement

Why this matters if you’re building image features

Three implications for builders worth thinking through:

1. The “diffusion needs a GPU” assumption is breaking. Until now, builders shipping image generation in a product had two real options: (a) call OpenAI / Stability / Gemini Imagen APIs at $0.02–$0.04 per image, or (b) self-host SDXL / Flux on a rented GPU at ~$0.50/hour. Bonsai Image 4B opens a third path: ship the model inside your iOS / macOS / desktop app and pay zero per-image cost. The 6 s latency on M4 Pro is slow compared to API responses but acceptable for “user clicks generate and waits” UX.

2. Apache 2.0 is the unlock. PrismML’s text Ternary Bonsai and now Bonsai Image both ship under Apache 2.0, which is commercial-use friendly without redistribution clauses. That contrasts with Stable Diffusion’s CreativeML OpenRAIL or Flux’s research-only license restrictions on the smaller variants. For SaaS startups embedding image-gen in a product, the legal review just got shorter.

3. Cross-pollinates with on-device LLM stories we’ve been tracking. If you’re already shipping a llama.cpp-based local LLM or hooking your agent CLI into a local backend via Harbor, Bonsai Image fits the same product thesis: AI workloads on the user’s device, not the cloud. The economics finally make sense end-to-end.

Honest caveats

“Up to 95% quality retention” is PrismML’s number, not a third-party measurement. They don’t yet publish FID, CLIP-score, or human-eval metrics in a directly comparable table to SDXL / Flux. Treat as directionally encouraging until independent evals land.
4B is small for text-to-image. Flux.1 [dev] is 12B, SDXL is 2.6B but with a beefier text encoder. Bonsai’s quality ceiling will be lower than the flagship cloud models — the win is “good enough on-device,” not “SOTA.”
iPhone inference uses MLX. The MLX-2bit variant on the Hugging Face collection requires iOS 18+ and an Apple Silicon device. There’s no Android port published yet.
grid.png and the comparison images on PrismML’s page are curated. Cherry-picked samples typically look stronger than what you’ll see across a random prompt set. Validate on your own use cases before betting a product on this.

Where to go next

Weights and code: Hugging Face — prism-ml/bonsai-image collection, GitHub — PrismML-Eng/Bonsai-image-demo
For context on PrismML’s quantization approach, see their Ternary Bonsai 8B text-model whitepaper
For on-device LLM patterns that pair well with on-device image-gen: our llama.cpp built-in tools coverage and Harbor v0.4.19 local agent launcher