Anthropic Engineering
· Frontier Labs
Quantifying infrastructure noise in agentic coding evals
Infrastructure configuration can swing agentic coding benchmarks by several percentage points—sometimes more than the leaderboard gap between top models.nn