arXiv cs.CV June 25, 2026 · Papers

Reflective VLA: In-Context Action Consequences Make VLAs Generalize

arXiv:2606.25215v1 Announce Type: new Abstract: Most vision-language-action (VLA) models are reactive: they predict the next action from the current instruction and observation, implicitly assuming that the current observation fully specifies the action-relevant state. In embodied control, however, embodiment-specific

Read original