arXiv cs.CL June 24, 2026 · Papers

Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification

arXiv:2606.23881v1 Announce Type: new Abstract: Knowledge-Based Visual Question Answering (KB-VQA) requires grounding visual queries to external knowledge beyond directly observable content in images. While recent multi modal large language models (MLLMs) show strong perceptual abilities, they struggle on KB-VQA tasks

Read original