Skip to content
HF Daily Papers · Papers

ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering

Knowledge-based Visual Question Answering (KB-VQA) requires models to combine image understanding with external knowledge. Most prior methods use a fixed retrieve-then-generate pipeline with a pre-selected retriever and a static top-k setting, which is not adaptive during reasoning. We propose ProMS