Merge pull request #41 from qscqesze/main
Merge pull request #41 from qscqesze/main Update vLLM Version Requirements in Documentation
Merge pull request #41 from qscqesze/main Update vLLM Version Requirements in Documentation
Merge branch 'main' of https://github.com/qscqesze/MiniMax-01
Update the vllm_deployment_guild_cn.md and vllm_deployment_guild.md files to include the version requirements and relevant build instructions for the MiniMax-Text-01 model.
How I started, why I write, who I write for, how I write, and more.
Welcome to the next stage of large language models (LLMs): reasoning. LLMs have transformed how we process and generate text, but their success has been largely…
QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction Last December, we launched QVQ-72B-Preview as an exploratory model, but it had many issues. Today, we are officially…
QWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORD We release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive…
I’m freezing this blog and starting to post on my Substack instead. The authoring experience is much more convenient for me there. Please follow me there,…
Build Fast with Text-to-Speech AI – Dialog Model on Groq
Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone.…
QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention…
RT Kai-Fu LeeThe biggest revelation from Deepseek is that Open Source has won. For a 1% difference in performance, it will be difficult for OpenAI to…
RT Kai-Fu LeeDeepSeek is becoming a Windows kernel demanded by businesses, but http://01.AI is aspired to build the Windows system and interface to ignite it. Check…
A new tool that improves Claude's complex problem-solving performance
Chip Huyen and I share what we've learned, best practices, and insights at NVIDIA GTC 2025.
Model architectures, data generation, training paradigms, and unified frameworks inspired by LLMs.
QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies…
What's Changed fix: do not use python_tag when encoding non-code_interpreter tool_calls by @ehhuang in #283 fix: tool_call was not encoded by @ehhuang in #284 Full Changelog:…
QWEN CHAT DISCORD This is a blog created by QwQ-Max-Preview. We hope you enjoy it! Introduction Okay, the user wants me to create a title and…
Add trt support for BF16 (#195) * fix interface of `get_sample_input` * save configuration parameters * ae wrapper implemented * fix import * add AEWrapper step…
My thoughts on China, export controls and two possible futures https://darioamodei.com/on-deepseek-and-export-controls
QWEN CHAT API DEMO DISCORD It is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence.…
DeepSeek's API has been experiencing reliability issues. Here are alternative providers you can use.
Tech Report HuggingFace ModelScope Qwen Chat HuggingFace Demo ModelScope Demo DISCORD Introduction Two months after upgrading Qwen2.5-Turbo to support context length up to one million tokens,…