Skip to content
X · @teortaxesTex · X / Twitter

Good multi-hop eval, hard to construct at scale though "NotSupportedByCitedSourceBench"

Good multi-hop eval, hard to construct at scale though"NotSupportedByCitedSourceBench"Guive Assadi: I asked Claude this question and it said yes, actually, the Dhofar War. Based on the Dhofar War's reputation for insane brutality, I thought that sounded weird. Claude's source turned out to be a paper that said: "in the