HF Daily Papers
· Papers
TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents
As large language models and harness frameworks continue to advance, agents operating in terminals are increasingly capable of performing a broader range of general computer-use tasks beyond coding. However, existing benchmarks do not adequately evaluate general-purpose terminal computer-use agents