Skip to content
Qwen3-VL (GitHub) · China Labs

Add VideoMME evaluation benchmark

Add VideoMME evaluation benchmark - Add VideoMME dataset evaluation pipeline with vLLM inference - Support short/medium/long video duration types - Include subtitle integration capability - Implement two-stage answer extraction (rule-based + LLM-based) - Follow same structure as mmmu evaluation benchmark - Add comprehe