arXiv cs.CV June 24, 2026 · Papers

REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs

arXiv:2606.23892v1 Announce Type: new Abstract: Vision-language models (VLMs) are increasingly used as perception-reasoning backbones for embodied intelligence in safety-critical physical systems, where perception or reasoning errors can lead to unsafe decisions or actions. Although many red-teaming methods have been d

Read original