Harbor v0.4.19 adds one-command launcher for Claude Code, Codex, and Opencode against local llama.cpp / vLLM

Harbor's new `launch` command boots an agent runner (Claude Code, Codex CLI, Opencode, Copilot, mi) wired straight to a local Ollama / llama.cpp / vLLM backend. The release also ships an Anthropic Messages and OpenAI Responses compat layer in Boost, plus an ik_llama.cpp service — closing the gap between local inference stacks and the agent CLIs builders actually use.

Harbor — the one-command local LLM stack maintained by av — shipped v0.4.19 on May 26, and the headline is a new harbor launch subcommand. It boots an agent CLI (Claude Code, Codex, Opencode, Copilot, mi) pre-wired to a local inference backend (Ollama, llama.cpp, vLLM), without you writing any of the usual glue.

What changed in v0.4.19

Per the release notes:

harbor launch command — start a host agent tool with adapters that point it at a Harbor-managed backend. Specify --backend llamacpp and --model <id> explicitly, or let it auto-detect the first available service.
Boost compat layers — Boost (Harbor’s request-shaping proxy) now exposes both an Anthropic-compatible Messages API and an OpenAI Responses API. That’s the prerequisite for Claude Code and Codex CLI to talk to a local backend as if it were the real cloud endpoint.
ik_llama.cpp service — first-class wrapper for the ik_llama.cpp fork (Ivan Komarov’s iquantization-friendly branch).
Boost hardening — streaming fixes, SDK compat audit across providers, request-payload validation.

Harbor architecture diagram showing CLI → services → backends — Harbor's architecture (source: av/harbor wiki). The v0.4.19 launch command sits at the top, picking a host tool and wiring it into the service layer.

What this means if you’re building agents on local models

Before v0.4.19, the workflow to use Claude Code against a local model looked like:

Boot llama.cpp server (or vLLM) on a specific port with the right flags.
Stand up a separate compat shim that mimics Anthropic’s /v1/messages schema (community projects like claude-bridge, or hand-rolled FastAPI).
Export ANTHROPIC_BASE_URL / ANTHROPIC_API_KEY env vars to point Claude Code at the shim.
Hope the SDK’s quirks (streaming chunk format, tool-call schema, token accounting) survive the shim.

After v0.4.19 it’s one line:

harbor launch claude --backend llamacpp --model qwen3-7-32b

Harbor handles the env-var dance, the compat layer, the backend lifecycle. Same shape works for Codex (harbor launch codex), Opencode, Copilot, mi.

That collapses the “I’d like to dogfood my local model with my actual agent CLI” project from a weekend of plumbing to one shell call. For builders who’ve been running agents-on-local in production — see our DeepSeek Reasonix coverage on cache-friendly agent design — this is the kind of infrastructure release that flattens the on-ramp for everyone else.

Honest caveats

Compat ≠ identical. Claude Code’s tool-call protocol has changed several times in 2026, and “Anthropic-compatible” depends on which version of the SDK you’re running. Harbor’s Boost layer is best-effort, not a guarantee. Test on a known-good prompt before assuming parity.
Local model quality matters more than the plumbing. Running Claude Code against a local 7B isn’t going to feel like Sonnet 4.6 no matter how clean the wrapper is. Use Harbor as a way to test models, not as a free Claude Code stand-in.
ik_llama.cpp isn’t merged upstream. ik_llama.cpp uses non-standard quantization formats (IQ1/IQ2/IQ3) optimized for memory bandwidth on Mac and Intel. If you need GGUF interop with mainline llama.cpp tools, confirm format compatibility first.

Where to go next

Harbor docs: github.com/av/harbor — harbor --help after install lists every supported agent + backend combo.
Pairing this with a fast local model? See our Qwen 3.7 Max coverage for one of the strongest open-weight options to point Harbor at, and the Qwen 3.7 open-weight watch for what to expect next from the open-weight side.
Want a deep cache pattern that survives multi-turn agent workflows? DeepSeek Reasonix is the reference architecture worth borrowing into your Harbor setup.

Sources

GitHub releases — av/harbor v0.4.19
av/harbor README — full host tool / backend matrix
vLLM docs — Claude Code integration
Unsloth — How to run local LLMs with Claude Code — comparable manual setup before Harbor’s launcher