Skip to content
Anthropic Engineering · Frontier Labs

Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet

SWE-bench is an AI evaluation benchmark that assesses a model's ability to complete real-world software engineering tasks.