Skip to content
AI Snake Oil (Narayanan) · Newsletters

Open-world evaluations for measuring frontier AI capabilities

Introducing CRUX, a new project for evaluating AI on long, messy tasks