7 minutes to master any new AI
Fresh, hands-on tutorials for every new AI release.
New model dropped? New tool launched? We test it, screenshot it, and turn it into a 7-minute guide you can actually use — never a press release rewrite.
Hand-tested by a builder · Live within 24h of every release · 59 guides & briefs shipped
Tutorials
view all →How to do it — we test each release and write the step-by-step guide you can follow.
-
How to Make Gemma 4 Run up to 2x Faster Locally: Multi-Token Prediction (MTP) + QAT
Two free speedups for local Gemma 4 that people keep confusing. QAT cuts memory ~72%; multi-token prediction (MTP) roughly doubles decode throughput — and MTP just landed in mainline llama.cpp (merged June 7, 2026). The exact Ollama, llama.cpp, and Transformers commands, what speedup to actually expect, and which sizes are supported. Sourced from the merged PR and Google's own docs.
-
How to Run Gemma 4 12B Locally: Ollama, llama.cpp & Transformers (Text, Image, Audio)
Gemma 4 12B is Google's encoder-free open model that runs text, image, and audio on a 16GB laptop under Apache 2.0. The exact Ollama, llama.cpp/GGUF, and Transformers setup — including how to pass images and native audio — all from the official model card and Unsloth docs. Copy-paste safe.
-
How to Run Bonsai Image 4B Locally: On-Device Text-to-Image on Mac & PC
Bonsai Image 4B is a ternary/1-bit diffusion model that fits in ~1 GB and generates a 512×512 image in ~6 s on an M4 Pro — fully on-device, Apache 2.0, zero per-image cost. The exact setup, which quant to download, the one-shot CLI, and the local Studio server, all from PrismML's official demo repo.
-
How to Run NuExtract 3 Locally: vLLM, Templates & Document Extraction
NuExtract 3 is a 4B open-weight VLM that pulls structured JSON out of any document — invoices, receipts, contracts, PDFs — and runs on a single 16GB GPU. The exact vLLM and Transformers setup, the JSON template language, image and multi-page PDF extraction, and document-to-Markdown — all from the official model card, copy-paste safe.
-
Run Qwen3.6-35B-A3B Locally for Coding: llama.cpp, Quants & VRAM
Qwen3.6-35B-A3B activates just 3B params but scores 73.4% on SWE-bench Verified — and it runs on a 24GB GPU or a 32GB Mac. The exact llama.cpp setup, which quant to pick, the VRAM math, and how to wire it into a coding agent.
-
Claude Code as a Daily Driver: CLAUDE.md, Skills, Subagents, Plugins & MCP
Most people use Claude Code like a chat box. The five features that turn it into a daily driver — persistent CLAUDE.md memory, reusable Skills, delegated Subagents, Plugins, and MCP servers — with the exact files, commands, and gotchas for each.