arXiv cs.CV June 24, 2026 · Papers

ABACUS: Adapting Unified Foundation Model for Bridging Image Count Understanding and Generation

arXiv:2606.23835v1 Announce Type: new Abstract: ABACUS is a unified vision-language model that handles object counting, crowd counting, referring-expression counting, and count-faithful image generation without any benchmark-specific training required. Our model is built on existing 3B-parameter unified foundation mode

Read original