Anthropic Engineering
· Frontier Labs
Raising the bar on SWE-bench Verified with Claude 3.5 Sonnet
SWE-bench is an AI evaluation benchmark that assesses a model's ability to complete real-world software engineering tasks.