News from METR · AI Feed

METR Tech Media March 20, 2026

Impact of modelling assumptions on time horizon results

As METR’s time horizon task suite saturates, the results are becoming more sensitive to analysis choices. One example of this was the recent update to fix…

METR Tech Media March 19, 2026

We spent 2 hours working in the future

Introduction METR aims to keep the public informed about the capabilities of and risks posed by AI — by some metrics the fastest-moving technology in history,…

METR Tech Media March 12, 2026

Review of the Anthropic Sabotage Risk Report: Claude Opus 4.6

We reviewed two versions of Anthropic’s Sabotage Risk Report for Claude Opus 4.6, producing two corresponding review documents: our review of the February 11 version and…

METR Tech Media March 10, 2026

Many SWE-bench-Passing PRs Would Not Be Merged into Main

.content figure figcaption p { font-weight: normal; } Summary: We find that roughly half of test-passing SWE-bench Verified PRs written by mid-2024 to mid/late-2025 agents would…

METR Tech Media March 3, 2026

Observations from two CLI game reimplementation runs with Opus 4.6

Summary: Opus 4.6 can, with a simple agent scaffold, create mostly-playable but somewhat broken CLI versions of Slay the Spire and Balatro1. Intro Last weekend I…

METR Tech Media February 24, 2026

We are Changing our Developer Productivity Experiment Design

METR previously published a paper which found the use of AI tools caused a 20% slowdown in completing tasks among experienced open-source developers, using data from…

METR Tech Media February 19, 2026

Five lessons from having helped run an AI-Biology RCT

.show-xxl { display: none; @media (min-width: 1471px) { display: block; } } Evidence-based AI policy is important but hard. We need more in-depth studies – which…

METR Tech Media February 18, 2026

How We Protect Confidential Information

.content .tab-pane blockquote { padding: 0.5rem 1rem; margin: 0.5rem 0; } .content .tab-pane blockquote p { font-size: 1.15rem; margin-bottom: 0.75rem; } .custom-padding-table-selector + table * {…

METR Tech Media February 17, 2026

Analyzing coding agent transcripts to upper bound productivity gains from AI agents

Introduction Human uplift studies like the one we did in 2025 are becoming more expensive as working without AI becomes increasingly costly. In this post, I…

METR Tech Media February 13, 2026

Measuring Time Horizon using Claude Code and Codex

Most of METR’s time horizon measurements are done using two scaffolds: Triframe and ReAct1. People sometimes see that we use these two scaffolds and feel skeptical…

Latest