Measuring LLMs' Impact on N-day Exploits
In cybersecurity, a large fraction of real-world harm comes from N-days: vulnerabilities that have already been publicly disclosed, but only patched on some devices. In this…
In cybersecurity, a large fraction of real-world harm comes from N-days: vulnerabilities that have already been publicly disclosed, but only patched on some devices. In this…
We’ve spent the past year investigating how threat actors are weaponizing AI to conduct cyber operations. Today, we’re sharing a new analysis that maps these real-world…
On two new, challenging academic benchmarks measuring AI models’ ability to develop exploits (ExploitBench and ExploitGym) and an updated version of the benchmark measuring smart contract…
Claude Mythos Preview is a new general-purpose language model that is strikingly capable at computer security tasks. This post provides technical details for researchers and practitioners…
This post dives deep into how Claude wrote an exploit for one of the vulnerabilities it found in Firefox.
In a collaboration with researchers at Mozilla, Claude Opus 4.6 discovered 22 Firefox vulnerabilities over the course of two weeks.
AI models can now find high-severity vulnerabilities at scale. This is a moment to empower defenders. We're now using Claude to find and help fix vulnerabilities…
In a recent evaluation of AI models’ cyber capabilities, current Claude models can now succeed at multistage attacks on networks with dozens of hosts using only…
Ensuring that programs are bug-free is one of the most challenging aspects of software engineering. We developed an agent that can efficiently identify bugs in large…
AI could help defenders of critical infrastructure identify the vulnerabilities that attackers might exploit—and close them before they are exploited. Anthropic has partnered with Pacific Northwest…