AI Snake Oil (Narayanan)
· Newsletters
Open-world evaluations for measuring frontier AI capabilities
Introducing CRUX, a new project for evaluating AI on long, messy tasks
Introducing CRUX, a new project for evaluating AI on long, messy tasks