arXiv cs.CL
· Papers
When Retrieval Metrics Mislead: Measuring Policy Signal in Long-Horizon Tool-Use Agents
arXiv:2606.23937v1 Announce Type: new Abstract: Exact-match retrieval recall is often used as a proxy for whether a retriever supplies useful policy context to a downstream decision model. We test this proxy for pre-action policy classification in tau-bench using Qwen2.5-3B/7B classifiers. Under gold-policy conditionin