From Hierarchy to Intelligence
The post From Hierarchy to Intelligence appeared first on Sequoia Capital.
The post From Hierarchy to Intelligence appeared first on Sequoia Capital.
Update: Further details on this exercise are included in our Frontier Risk Report (February-March 2026), within the Anthropic section of Appendix B. In collaboration with Anthropic,…
As METR’s time horizon task suite saturates, the results are becoming more sensitive to analysis choices. One example of this was the recent update to fix…
Introduction METR aims to keep the public informed about the capabilities of and risks posed by AI — by some metrics the fastest-moving technology in history,…
The post Partnering with Edra: Context for Agents at Scale appeared first on Sequoia Capital.
We reviewed two versions of Anthropic’s Sabotage Risk Report for Claude Opus 4.6, producing two corresponding review documents: our review of the February 11 version and…
The post Partnering with Scanner: Every Log Tells a Story—If You Can Find It Fast Enough appeared first on Sequoia Capital.
.content figure figcaption p { font-weight: normal; } Summary: We find that roughly half of test-passing SWE-bench Verified PRs written by mid-2024 to mid/late-2025 agents would…
Summary: Opus 4.6 can, with a simple agent scaffold, create mostly-playable but somewhat broken CLI versions of Slay the Spire and Balatro1. Intro Last weekend I…
METR previously published a paper which found the use of AI tools caused a 20% slowdown in completing tasks among experienced open-source developers, using data from…
.show-xxl { display: none; @media (min-width: 1471px) { display: block; } } Evidence-based AI policy is important but hard. We need more in-depth studies – which…
.content .tab-pane blockquote { padding: 0.5rem 1rem; margin: 0.5rem 0; } .content .tab-pane blockquote p { font-size: 1.15rem; margin-bottom: 0.75rem; } .custom-padding-table-selector + table * {…
Introduction Human uplift studies like the one we did in 2025 are becoming more expensive as working without AI becomes increasingly costly. In this post, I…
Most of METR’s time horizon measurements are done using two scaffolds: Triframe and ReAct1. People sometimes see that we use these two scaffolds and feel skeptical…
In this post, I describe a simple model for forecasting when AI will automate AI development. It is based on the AI Futures model, but more…
Ver como PDF Los desarrolladores de IA de frontera como OpenAI, Google, Anthropic, xAI y otros tienen obligaciones de seguridad y protección bajo la SB 53…
View as PDF Frontier AI developers such as OpenAI, Google, Anthropic, xAI, and others are governed by safety and security obligations under California’s SB 53, New…
查看英文 PDF 版 OpenAI、Google、Anthropic、xAI 等前沿 AI 开发者,以及部分中国 AI 开发者,已经需要遵守多项安全与安保义务。主要来源包括加州 SB 53、纽约 RAISE 法案,以及欧盟《人工智能法》中有关前沿 AI 的条款。相关法规是否适用,不取决于公司注册地,而取决于模型在哪里部署、公司在当地如何开展业务。这些要求涵盖事件报告、模型评估、安全与安保缓解措施、内部治理和举报人保护。本文只梳理关键条款,不能替代官方法律文本。 法律 适用对象 风险 义务 时间表 加州 SB 53 使用 >10^26 FLOPs…
.content figure figcaption p { font-weight: normal } We’re releasing a new version of our time horizon estimates (TH1.1), using more tasks and a new eval…
An eventful year of progress in health and career, while making time for travel and reflection.
Label some data, align LLM-evaluators, and run the eval harness with each change.
Based on what I've learned from role models and mentors in Amazon
An LLM that can converse in English & item IDs, and make recommendations w/o retrieval or tools.
Inspect AI, An OSS Python Library For LLM Evals