arXiv cs.AI
· Papers
MMGist: A Comprehensive Multimodal Benchmark for 2027
arXiv:2606.22437v2 Announce Type: replace-cross Abstract: We conduct a systematic study of 18 widely used vision-language benchmarks and identify three major issues: 1) many items do not rely on visual cues and therefore fail to effectively measure multimodal understanding; 2) many items are already close to performanc