HF Daily Papers
· Papers
Video-MME-Logical: A Controlled Diagnostic Benchmark for Video Temporal-Logical Reasoning
Recent interest in multimodal large language models (MLLMs) raises a central question: can they reason over dynamic visual evidence rather than merely recognize objects or events in individual frames? This ability, which we refer to as video temporal-logical reasoning, requires models to maintain, u