X · @teortaxesTex
· X / Twitter
Good multi-hop eval, hard to construct at scale though "NotSupportedByCitedSourceBench"
Good multi-hop eval, hard to construct at scale though"NotSupportedByCitedSourceBench"Guive Assadi: I asked Claude this question and it said yes, actually, the Dhofar War. Based on the Dhofar War's reputation for insane brutality, I thought that sounded weird. Claude's source turned out to be a paper that said: "in the