News from Alibaba Qwen

Alibaba Qwen News March 27, 2025

QVQ-Max: Think with Evidence

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction Last December, we launched QVQ-72B-Preview as an exploratory model, but it had many issues. Today, we are officially…

Alibaba Qwen News March 26, 2025

Qwen2.5 Omni: See, Hear, Talk, Write, Do It All!

QWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORD We release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive…

Alibaba Qwen News March 23, 2025

Qwen2.5-VL-32B: Smarter and Lighter

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention…

Alibaba Qwen News March 5, 2025

QwQ-32B: Embracing the Power of Reinforcement Learning

QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies…

Alibaba Qwen News February 24, 2025

<think>…</think> QwQ-Max-Preview

QWEN CHAT DISCORD This is a blog created by QwQ-Max-Preview. We hope you enjoy it! Introduction Okay, the user wants me to create a title and…

Alibaba Qwen News January 28, 2025

Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model

QWEN CHAT API DEMO DISCORD It is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence.…

Alibaba Qwen News January 26, 2025

Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens

Tech Report HuggingFace ModelScope Qwen Chat HuggingFace Demo ModelScope Demo DISCORD Introduction Two months after upgrading Qwen2.5-Turbo to support context length up to one million tokens,…

Alibaba Qwen News January 26, 2025

Qwen2.5 VL! Qwen2.5 VL! Qwen2.5 VL!

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD We release Qwen2.5-VL, the new flagship vision-language model of Qwen and also a significant leap from the previous Qwen2-VL.…

Alibaba Qwen News January 20, 2025

Global-batch load balance almost free lunch to improve your MoE LLM training

GITHUB HUGGING FACE MODELSCOPE DISCORD Background The Mixture-of-Experts (MoEs) architecture has become a popular model-parameter-scale-up technique. Typically, one MoE layer consists of a router (often parameterized…

Alibaba Qwen News January 13, 2025

Towards Effective Process Supervision in Mathematical Reasoning

GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction In recent years, Large Language Models (LLMs) have made remarkable advances in mathematical reasoning, yet they can make mistakes, such…

Latest