Anthropic open-sourced its vulnerability-discovery harness — you can run Claude's recon → find → patch loop on your own code

Anthropic published defending-code-reference-harness, a reference implementation that runs autonomous vulnerability discovery and patching with Claude. It ships as Claude Code skills plus a sandboxed pipeline that finds C/C++ memory bugs, verifies the crashes, and proposes fixes. Here's what it changes if you ship code — and the catch the README is upfront about.

Anthropic put a reference implementation for autonomous vulnerability discovery and remediation with Claude on GitHub, under anthropics/defending-code-reference-harness. It hit the Hacker News front page this week. (Source: Hacker News) It is the DIY counterpart to Anthropic’s hosted Claude Security product: the same recon → find → verify → patch loop, but as open source you run yourself.

Key facts:

It ships as two parts: interactive Claude Code skills and an autonomous pipeline. (Source: repo README)
The skills are /quickstart, /threat-model, /vuln-scan, /triage, /patch, and /customize. The first four only read and write files — safe to run unsandboxed.
The autonomous pipeline finds C/C++ memory bugs using Docker and ASAN, the address sanitizer.
Every autonomous agent runs inside a gVisor sandbox with egress restricted to the Claude API. The pipeline refuses to start outside that sandbox unless explicitly overridden.
It works with Claude through the Anthropic API, Bedrock, Vertex, or Azure.
The repo is not maintained and not accepting contributions — Anthropic calls it a reference, not a product.
It is built on learnings from Project Glasswing partnerships since the Claude Mythos Preview launch.

GitHub repository card for anthropics/defending-code-reference-harness — The public repo: skills for threat modeling, scanning, triage, and patching, plus an autonomous harness you can `/customize`. (Source: Anthropic / GitHub)

How the loop actually works

The autonomous pipeline walks through seven stages. Build compiles the target into a Docker image with ASAN. Recon reads the source and partitions it into separate input-parsing subsystems so parallel agents attack different areas. Find spins up N agents in isolated containers, each crafting malformed inputs until one crashes the ASAN binary 3 out of 3 times. Verify hands a fresh grader agent only the proof of concept and makes it reproduce the crash in a container the find agent never touched. Dedupe decides whether each verified crash is new, a better example of a known bug, or a duplicate. Report writes a structured exploitability analysis. Patch generates a fix and confirms the code builds, the original crash is gone, the test suite still passes, and a fresh find agent can’t route around it. (Source: repo README)

Diagram of the harness pipeline stages: Build, Recon, Find, Grade, Judge, Report, Dedup — Anthropic's diagram of the reference pipeline — execution-verified findings, not just static analysis. (Source: Anthropic / GitHub)

What this means if you ship code

The interesting move is the on-ramp. The README lays out a paced plan: Day 1 is a threat model plus a static scan and triage using only the read-only skills; Day 2 runs the sandboxed pipeline on a known-vulnerable C/C++ library; Days 3–5 port it to your stack with /customize; Week 2 is autonomous scanning across your repos. You point it at your own service, answer three questions — what signals a finding, what a proof of concept looks like, how the target builds — and the generic shape adapts beyond C/C++.

If you already run Claude as your coding agent, this slots into the same workflow. The whole harness is driven from Claude Code, so the token-metering and approval discipline from running Claude Code as a daily driver applies here too. And it shares lineage with the model behind the first public M5 kernel exploit — see how Claude Mythos broke Apple’s MIE in five days.

The catch

The README is unusually honest: autonomous triage and patching are still open problems this harness does not fully solve, and verified patches are not always upstreamable — budget real engineering time for them. Cost is the other wall. HN commenters estimated roughly 10K uncached input tokens and 2K output tokens per minute per agent, projecting hundreds to thousands of dollars a month at scale. (Source: Hacker News) Findings are non-deterministic, so coverage comes from running more parallel waves — which means spending more. This is a starting jig you sharpen, not a scanner you switch on.

Sources

anthropics/defending-code-reference-harness — Anthropic / GitHub
Anthropic’s open-source framework for AI-powered vulnerability discovery — Hacker News discussion
Claude Security — Anthropic, managed product
Project Glasswing — Anthropic