Skip to content
HF Daily Papers · Papers

TUA-Bench: A Benchmark for General-Purpose Terminal-Use Agents

As large language models and harness frameworks continue to advance, agents operating in terminals are increasingly capable of performing a broader range of general computer-use tasks beyond coding. However, existing benchmarks do not adequately evaluate general-purpose terminal computer-use agents