News from Anthropic Red (safety)

Anthropic Red (safety) Frontier Labs December 18, 2025

Project Vend: Phase Two

In June, we revealed that we'd set up a small shop in our San Francisco office run by an AI shopkeeper. It did not do particularly…

Anthropic Red (safety) Frontier Labs December 1, 2025

AI Agents Find Smart Contract Exploits

We evaluated AI agents' ability to exploit smart contracts using a new benchmark comprising contracts that were actually exploited. On contracts exploited after the latest knowledge…

Anthropic Red (safety) Frontier Labs November 12, 2025

Project Fetch

How could frontier AI models like Claude reach beyond computers and affect the physical world? One path is through robots. We ran an experiment to see…

Anthropic Red (safety) Frontier Labs September 29, 2025

Building AI for Cyber Defenders

We invested in improving Claude's ability to help defenders detect, analyze, and remediate vulnerabilities in code and deployed systems. This work allowed Claude Sonnet 4.5 to…

Anthropic Red (safety) Frontier Labs September 5, 2025

LLMs and Biorisk

Our work at Anthropic is animated by the potential for AI to advance scientific discovery—especially in biology and medicine. At the same time, AI is fundamentally…

Anthropic Red (safety) Frontier Labs August 21, 2025

Developing Nuclear Safeguards for AI

Together with the NNSA and DOE national laboratories, we have co-developed a classifier—an AI system that automatically categorizes content—that distinguishes between concerning and benign nuclear-related conversations…

Anthropic Red (safety) Frontier Labs August 9, 2025

Claude Does Cyber Competitions

Throughout 2025, we have been quietly entering Claude in cybersecurity competitions designed primarily for humans. In many of these competitions Claude did pretty well, often placing…

Anthropic Red (safety) Frontier Labs July 15, 2025

Cyber Evaluations of Claude 4

We partnered with Pattern Labs on a range of cybersecurity evaluations of Claude Opus 4 and Claude Sonnet 4, with Opus demonstrating especially notable improvement over…

Anthropic Red (safety) Frontier Labs June 27, 2025

Project Vend

We let Claude manage an automated store in our office as a small business for about a month. We learned a lot about the plausible, strange,…

Anthropic Red (safety) Frontier Labs June 13, 2025

Cyber Toolkits for LLMs

Large Language Models (LLMs) that are not fine-tuned for cybersecurity can succeed in multistage attacks on networks with dozens of hosts when equipped with a novel…

Latest