FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning
Multimodal driving planning faces a long-standing tension between two paradigms: scoring-based methods benefit from dense reward supervision but are confined to a fixed action vocabulary, while…