Skip to content
arXiv cs.CV · Papers

Position: Reasoning After Perception Means Reasoning Without Vision

arXiv:2507.16863v2 Announce Type: replace Abstract: A common belief in multimodal research is that the perceptual weaknesses of vision--language models can be compensated by stronger language reasoning (e.g., chain-of-thought, in-context learning, or external tools). We challenge this assumption. We argue that for a br