Skip to content
X · @jeremyphoward · X / Twitter

RT Artificial Analysis: Announcing AA-Briefcase, the benchmark for the next era of agentic knowledge work AA-Briefcase is our new benchmark for testin…

RT Artificial AnalysisAnnouncing AA-Briefcase, the benchmark for the next era of agentic knowledge workAA-Briefcase is our new benchmark for testing models on long-horizon knowledge work tasks in complex projects built by industry experts. Models are evaluated on multi-week projects, each with many linked tasks and tho