arXiv cs.CV June 25, 2026 · Papers

AMVICC: A Novel Benchmark for Cross-Modal Failure Mode Profiling for VLMs and IGMs

arXiv:2601.17037v2 Announce Type: replace Abstract: We investigate visual reasoning limitations of both multimodal large language models (MLLMs) and image generation models (IGMs) by creating a novel benchmark to systematically compare failure modes across image-to-text and text-to-image tasks, enabling cross-modal eva

Read original