First Look at Reasoning From Scratch: Chapter 1
Welcome to the next stage of large language models (LLMs): reasoning. LLMs have transformed how we process and generate text, but their success has been largely…
Welcome to the next stage of large language models (LLMs): reasoning. LLMs have transformed how we process and generate text, but their success has been largely…
xAI acquires X in all-stock merger — $80B for xAI, $33B for X, combined entity $113B under xAI Holdings Corp. CoreWeave (CRWV) debuts on Nasdaq at…
Anthropic ships a landmark double-paper on Claude 3.5 Haiku's internal mechanisms — circuit tracing, multistep planning, cross-linguistic generalization. CoreWeave prices its IPO at $40/share, raising $1.5B…
QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction Last December, we launched QVQ-72B-Preview as an exploratory model, but it had many issues. Today, we are officially…
OpenAI delays 4o image gen rollout to Free tier as demand 'wayyyy more popular than we expected.' Alibaba open-sources Qwen2.5-Omni-7B end-to-end multimodal model under Apache 2.0.…
QWEN CHAT HUGGING FACE MODELSCOPE DASHSCOPE GITHUB PAPER DEMO DISCORD We release Qwen2.5-Omni, the new flagship end-to-end multimodal model in the Qwen series. Designed for comprehensive…
I’m freezing this blog and starting to post on my Substack instead. The authoring experience is much more convenient for me there. Please follow me there,…
Build Fast with Text-to-Speech AI – Dialog Model on Groq
Frontier collision day. OpenAI ships native 4o image generation in ChatGPT and Sora, killing DALL-E 3. Google DeepMind drops Gemini 2.5 Pro Experimental — #1 on…
Training Diffusion Models with Reinforcement Learning We deployed 100 reinforcement learning (RL)-controlled cars into rush-hour highway traffic to smooth congestion and reduce fuel consumption for everyone.…
DeepSeek drops V3-0324 on Hugging Face with no model card, no blog, MIT license, 685GB weights — Aider polyglot jumps 9.3, AIME jumps 19.8. Runs on…
Quiet Sunday before a big Monday. Western press finally catches up on Tencent's Hunyuan-T1 reasoning model. xAI Grok standalone app continues weekend rollout. No fresh announcements…
QWEN CHAT GITHUB HUGGING FACE MODELSCOPE DISCORD Introduction At the end of January this year, we launched the Qwen2.5-VL series of models, which received widespread attention…
Quiet Saturday after a heavy GTC week. xAI launches a standalone Grok iOS app, decoupling the chatbot from X for the first time. No major frontier-lab…
RT Kai-Fu LeeThe biggest revelation from Deepseek is that Open Source has won. For a 1% difference in performance, it will be difficult for OpenAI to…
Tencent ships Hunyuan-T1 — first ultra-large hybrid Mamba-Transformer MoE reasoning model, matching DeepSeek-R1 and beating GPT-4.5 on MMLU-Pro at ~99% lower price than o1. NVIDIA closes…
NVIDIA hosts inaugural Quantum Day at GTC with D-Wave, IonQ, Rigetti, Quantinuum, PsiQuantum and others sharing a stage. Anthropic ships web search for Claude. Foxconn showcases…
RT Kai-Fu LeeDeepSeek is becoming a Windows kernel demanded by businesses, but http://01.AI is aspired to build the Windows system and interface to ignite it. Check…
A new tool that improves Claude's complex problem-solving performance
GTC Day 3 fans out: Llama Nemotron Nano/Super/Ultra open reasoning models, Dynamo inference framework, Newton physics engine with DeepMind and Disney, Spectrum-X silicon photonics, NVAQC Boston…
Jensen's GTC keynote: Blackwell Ultra GB300, Vera Rubin roadmap to 2027, Isaac GR00T N1 open humanoid model, Dynamo open inference framework, Newton physics engine with DeepMind…
Chip Huyen and I share what we've learned, best practices, and insights at NVIDIA GTC 2025.
Mistral Small 3.1 24B drops with Apache 2.0 license, 128K context and multimodal. Roblox open-sources Cube 3D foundation model. NVIDIA GTC 2025 + GDC 2025 both…
Baidu releases Ernie 4.5 and X1 reasoning model. NVIDIA GTC 2025 opens with pre-conference workshops in San Jose ahead of Jensen's keynote on Tuesday. Otherwise a…