HF Daily Papers
· Papers
ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering
Knowledge-based Visual Question Answering (KB-VQA) requires models to combine image understanding with external knowledge. Most prior methods use a fixed retrieve-then-generate pipeline with a pre-selected retriever and a static top-k setting, which is not adaptive during reasoning. We propose ProMS