Tech Media news · AI Feed

Sequoia Capital Tech Media March 31, 2026

From Hierarchy to Intelligence

The post From Hierarchy to Intelligence appeared first on Sequoia Capital.

METR Tech Media March 26, 2026

Red-Teaming Anthropic's Internal Agent Monitoring Systems

Update: Further details on this exercise are included in our Frontier Risk Report (February-March 2026), within the Anthropic section of Appendix B. In collaboration with Anthropic,…

METR Tech Media March 20, 2026

Impact of modelling assumptions on time horizon results

As METR’s time horizon task suite saturates, the results are becoming more sensitive to analysis choices. One example of this was the recent update to fix…

METR Tech Media March 19, 2026

We spent 2 hours working in the future

Introduction METR aims to keep the public informed about the capabilities of and risks posed by AI — by some metrics the fastest-moving technology in history,…

Sequoia Capital Tech Media March 18, 2026

Partnering with Edra: Context for Agents at Scale

The post Partnering with Edra: Context for Agents at Scale appeared first on Sequoia Capital.

METR Tech Media March 12, 2026

Review of the Anthropic Sabotage Risk Report: Claude Opus 4.6

We reviewed two versions of Anthropic’s Sabotage Risk Report for Claude Opus 4.6, producing two corresponding review documents: our review of the February 11 version and…

Sequoia Capital Tech Media March 10, 2026

Partnering with Scanner: Every Log Tells a Story—If You Can Find It Fast Enough

The post Partnering with Scanner: Every Log Tells a Story—If You Can Find It Fast Enough appeared first on Sequoia Capital.

METR Tech Media March 10, 2026

Many SWE-bench-Passing PRs Would Not Be Merged into Main

.content figure figcaption p { font-weight: normal; } Summary: We find that roughly half of test-passing SWE-bench Verified PRs written by mid-2024 to mid/late-2025 agents would…

METR Tech Media March 3, 2026

Observations from two CLI game reimplementation runs with Opus 4.6

Summary: Opus 4.6 can, with a simple agent scaffold, create mostly-playable but somewhat broken CLI versions of Slay the Spire and Balatro1. Intro Last weekend I…

METR Tech Media February 24, 2026

We are Changing our Developer Productivity Experiment Design

METR previously published a paper which found the use of AI tools caused a 20% slowdown in completing tasks among experienced open-source developers, using data from…

METR Tech Media February 19, 2026

Five lessons from having helped run an AI-Biology RCT

.show-xxl { display: none; @media (min-width: 1471px) { display: block; } } Evidence-based AI policy is important but hard. We need more in-depth studies – which…

METR Tech Media February 18, 2026

How We Protect Confidential Information

.content .tab-pane blockquote { padding: 0.5rem 1rem; margin: 0.5rem 0; } .content .tab-pane blockquote p { font-size: 1.15rem; margin-bottom: 0.75rem; } .custom-padding-table-selector + table * {…

METR Tech Media February 17, 2026

Analyzing coding agent transcripts to upper bound productivity gains from AI agents

Introduction Human uplift studies like the one we did in 2025 are becoming more expensive as working without AI becomes increasingly costly. In this post, I…

METR Tech Media February 13, 2026

Measuring Time Horizon using Claude Code and Codex

Most of METR’s time horizon measurements are done using two scaffolds: Triframe and ReAct1. People sometimes see that we use these two scaffolds and feel skeptical…

METR Tech Media February 10, 2026

A simpler AI timelines model predicts 99% AI R&D automation in ~2032

In this post, I describe a simple model for forecasting when AI will automate AI development. It is based on the AI Futures model, but more…

METR Tech Media January 29, 2026

Regulación de seguridad de IA de frontera: una referencia para el personal de laboratorios

Ver como PDF Los desarrolladores de IA de frontera como OpenAI, Google, Anthropic, xAI y otros tienen obligaciones de seguridad y protección bajo la SB 53…

METR Tech Media January 29, 2026

Frontier AI safety regulations: A reference for lab staff

View as PDF Frontier AI developers such as OpenAI, Google, Anthropic, xAI, and others are governed by safety and security obligations under California’s SB 53, New…

METR Tech Media January 29, 2026

前沿 AI 安全法规：AI 公司员工参考指南

查看英文 PDF 版 OpenAI、Google、Anthropic、xAI 等前沿 AI 开发者，以及部分中国 AI 开发者，已经需要遵守多项安全与安保义务。主要来源包括加州 SB 53、纽约 RAISE 法案，以及欧盟《人工智能法》中有关前沿 AI 的条款。相关法规是否适用，不取决于公司注册地，而取决于模型在哪里部署、公司在当地如何开展业务。这些要求涵盖事件报告、模型评估、安全与安保缓解措施、内部治理和举报人保护。本文只梳理关键条款，不能替代官方法律文本。法律适用对象风险义务时间表加州 SB 53 使用 >10^26 FLOPs…

METR Tech Media January 29, 2026

Time Horizon 1.1

.content figure figcaption p { font-weight: normal } We’re releasing a new version of our time horizon estimates (TH1.1), using more tasks and a new eval…

Eugene Yan Tech Media December 14, 2025

2025 Year in Review

An eventful year of progress in health and career, while making time for travel and reflection.

Eugene Yan Tech Media November 23, 2025

Product Evals in Three Simple Steps

Label some data, align LLM-evaluators, and run the eval harness with each change.

Eugene Yan Tech Media October 19, 2025

Advice for New Principal Tech ICs (i.e., Notes to Myself)

Based on what I've learned from role models and mentors in Amazon

Eugene Yan Tech Media September 14, 2025

Training an LLM-RecSys Hybrid for Steerable Recs with Semantic IDs

An LLM that can converse in English & item IDs, and make recommendations w/o retrieval or tools.

Hamel Husain Tech Media June 23, 2025

Inspect AI, An OSS Python Library For LLM Evals

Tech Media 164 stories

From Hierarchy to Intelligence

Red-Teaming Anthropic's Internal Agent Monitoring Systems

Impact of modelling assumptions on time horizon results

We spent 2 hours working in the future

Partnering with Edra: Context for Agents at Scale

Review of the Anthropic Sabotage Risk Report: Claude Opus 4.6

Partnering with Scanner: Every Log Tells a Story—If You Can Find It Fast Enough

Many SWE-bench-Passing PRs Would Not Be Merged into Main

Observations from two CLI game reimplementation runs with Opus 4.6

We are Changing our Developer Productivity Experiment Design

Five lessons from having helped run an AI-Biology RCT

How We Protect Confidential Information

Analyzing coding agent transcripts to upper bound productivity gains from AI agents

Measuring Time Horizon using Claude Code and Codex

A simpler AI timelines model predicts 99% AI R&D automation in ~2032

Regulación de seguridad de IA de frontera: una referencia para el personal de laboratorios

Frontier AI safety regulations: A reference for lab staff

前沿 AI 安全法规：AI 公司员工参考指南

Time Horizon 1.1

2025 Year in Review

Product Evals in Three Simple Steps

Advice for New Principal Tech ICs (i.e., Notes to Myself)

Training an LLM-RecSys Hybrid for Steerable Recs with Semantic IDs

Inspect AI, An OSS Python Library For LLM Evals