HF Daily Papers
· Papers
FlowR2A: Learning Reward-to-Action Distribution for Multimodal Driving Planning
Multimodal driving planning faces a long-standing tension between two paradigms: scoring-based methods benefit from dense reward supervision but are confined to a fixed action vocabulary, while anchor-based methods generate proposals dynamically yet suffer from sparse supervision constrained to a si