Skip to content
Anthropic Engineering · Frontier Labs

Quantifying infrastructure noise in agentic coding evals

Infrastructure configuration can swing agentic coding benchmarks by several percentage points—sometimes more than the leaderboard gap between top models.nn