arXiv cs.CL
· Papers
Ground Then Rank: Revisiting Knowledge-Based VQA with Training-Free Entity Identification
arXiv:2606.23881v1 Announce Type: new Abstract: Knowledge-Based Visual Question Answering (KB-VQA) requires grounding visual queries to external knowledge beyond directly observable content in images. While recent multi modal large language models (MLLMs) show strong perceptual abilities, they struggle on KB-VQA tasks