AI Feed

MiniMax-01 (GitHub) China Labs April 2, 2025

Merge pull request #41 from qscqesze/main

Merge pull request #41 from qscqesze/main Update vLLM Version Requirements in Documentation

MiniMax-01 (GitHub) China Labs April 2, 2025

Merge branch ‘main’ of https://github.com/qscqesze/MiniMax-01

Merge branch 'main' of https://github.com/qscqesze/MiniMax-01

MiniMax-01 (GitHub) China Labs April 2, 2025

Update the vllm_deployment_guild_cn.md and vllm_deployment_guild.md f…

Update the vllm_deployment_guild_cn.md and vllm_deployment_guild.md files to include the version requirements and relevant build instructions for the MiniMax-Text-01 model.

Eugene Yan Tech Media March 30, 2025

Frequently Asked Questions about My Writing Process

How I started, why I write, who I write for, how I write, and more.

Ahead of AI (Raschka) Newsletters March 29, 2025

First Look at Reasoning From Scratch: Chapter 1

Welcome to the next stage of large language models (LLMs): reasoning. LLMs have transformed how we process and generate text, but their success has been largely…

Alibaba Qwen News March 27, 2025

QVQ-Max: Think with Evidence

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction Last December, we launched QVQ-72B-Preview as an exploratory model, but it had many issues. Today, we are officially…

Alibaba Qwen News March 26, 2025

Qwen2.5 Omni: See, Hear, Talk, Write, Do It All!

QWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORD We release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive…

Jay Alammar Tech Media March 26, 2025

Moving To Substack

I’m freezing this blog and starting to post on my Substack instead. The authoring experience is much more convenient for me there. Please follow me there,…

Groq Infrastructure March 26, 2025

Build Fast with Text-to-Speech AI – Dialog Model on Groq

BAIR Berkeley Open Source March 25, 2025

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone.…

Alibaba Qwen News March 23, 2025

Qwen2.5-VL-32B: Smarter and Lighter

QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention…

X · @01AI_Yi China Labs March 22, 2025

RT Kai-Fu Lee: The biggest revelation from Deepseek is that Open Source has won. For a 1% difference in performance, it will be difficult for OpenAI t…

RT Kai-Fu LeeThe biggest revelation from Deepseek is that Open Source has won. For a 1% difference in performance, it will be difficult for OpenAI to…

X · @01AI_Yi China Labs March 20, 2025

RT Kai-Fu Lee: DeepSeek is becoming a Windows kernel demanded by businesses, but http://01.AI is aspired to build the Windows system and interface to …

RT Kai-Fu LeeDeepSeek is becoming a Windows kernel demanded by businesses, but http://01.AI is aspired to build the Windows system and interface to ignite it. Check…

Anthropic Engineering Frontier Labs March 20, 2025

The "think" tool: Enabling Claude to stop and think in complex tool use situations

A new tool that improves Claude's complex problem-solving performance

Eugene Yan Tech Media March 18, 2025

NVIDIA GTC 2025 – Building LLM-Powered Applications

Chip Huyen and I share what we've learned, best practices, and insights at NVIDIA GTC 2025.

Eugene Yan Tech Media March 16, 2025

Improving Recommendation Systems & Search in the Age of LLMs

Model architectures, data generation, training paradigms, and unified frameworks inspired by LLMs.

Alibaba Qwen News March 5, 2025

QwQ-32B: Embracing the Power of Reinforcement Learning

QWEN CHAT Hugging Face ModelScope DEMO DISCORD Scaling Reinforcement Learning (RL) has the potential to enhance model performance beyond conventional pretraining and post-training methods. Recent studies…

Meta Llama (GitHub) Frontier Labs February 25, 2025

v0.1.4

What's Changed fix: do not use python_tag when encoding non-code_interpreter tool_calls by @ehhuang in #283 fix: tool_call was not encoded by @ehhuang in #284 Full Changelog:…

Alibaba Qwen News February 24, 2025

<think>…</think> QwQ-Max-Preview

QWEN CHAT DISCORD This is a blog created by QwQ-Max-Preview. We hope you enjoy it! Introduction Okay, the user wants me to create a title and…

FLUX (Black Forest Labs) Generative Media January 31, 2025

Add trt support for BF16 (#195)

Add trt support for BF16 (#195) * fix interface of `get_sample_input` * save configuration parameters * ae wrapper implemented * fix import * add AEWrapper step…

X · @darioamodei X / Twitter January 29, 2025

My thoughts on China, export controls and two possible futures https://darioamodei.com/on-deepseek-and-export-controls

Alibaba Qwen News January 28, 2025

Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model

QWEN CHAT API DEMO DISCORD It is widely recognized that continuously scaling both data size and model size can lead to significant improvements in model intelligence.…

Aider Infrastructure January 28, 2025

Alternative DeepSeek V3 providers

DeepSeek's API has been experiencing reliability issues. Here are alternative providers you can use.

Alibaba Qwen News January 26, 2025

Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens

Tech Report HuggingFace ModelScope Qwen Chat HuggingFace Demo ModelScope Demo DISCORD Introduction Two months after upgrading Qwen2.5-Turbo to support context length up to one million tokens,…

Latest

Merge pull request #41 from qscqesze/main

Merge branch ‘main’ of https://github.com/qscqesze/MiniMax-01

Update the vllm_deployment_guild_cn.md and vllm_deployment_guild.md f…

Frequently Asked Questions about My Writing Process

First Look at Reasoning From Scratch: Chapter 1

QVQ-Max: Think with Evidence

Qwen2.5 Omni: See, Hear, Talk, Write, Do It All!

Moving To Substack

Build Fast with Text-to-Speech AI – Dialog Model on Groq

Scaling Up Reinforcement Learning for Traffic Smoothing: A 100-AV Highway Deployment

Qwen2.5-VL-32B: Smarter and Lighter

RT Kai-Fu Lee: The biggest revelation from Deepseek is that Open Source has won. For a 1% difference in performance, it will be difficult for OpenAI t…

RT Kai-Fu Lee: DeepSeek is becoming a Windows kernel demanded by businesses, but http://01.AI is aspired to build the Windows system and interface to …

The "think" tool: Enabling Claude to stop and think in complex tool use situations

NVIDIA GTC 2025 – Building LLM-Powered Applications

Improving Recommendation Systems & Search in the Age of LLMs

QwQ-32B: Embracing the Power of Reinforcement Learning

v0.1.4

<think>…</think> QwQ-Max-Preview

Add trt support for BF16 (#195)

My thoughts on China, export controls and two possible futures https://darioamodei.com/on-deepseek-and-export-controls

Qwen2.5-Max: Exploring the Intelligence of Large-scale MoE Model

Alternative DeepSeek V3 providers

Qwen2.5-1M: Deploy Your Own Qwen with Context Length up to 1M Tokens

Browse by category