HF Daily Papers June 29, 2026 · Papers

ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering

Knowledge-based Visual Question Answering (KB-VQA) requires models to combine image understanding with external knowledge. Most prior methods use a fixed retrieve-then-generate pipeline with a pre-selected retriever and a static top-k setting, which is not adaptive during reasoning. We propose ProMS

Read original