AI Daily Brief — 17 February 2025
Monday delivered xAI’s biggest swing of the year. The Grok 3 livestream — Musk with co-founders Jimmy Ba and Yuhuai “Tony” Wu, plus lead engineer Igor Babuschkin — pitched the model as the best in the world at complex math, physics, and coding: 93.3% AIME 2025, 84.6% GPQA graduate-level science, 79.4% LiveCodeBench. An early Grok 3 checkpoint tested anonymously on LMArena under the codename “Chocolate” became the first ever model to cross an Elo of 1402 and took #1 across every category on the Chatbot Arena leaderboard. xAI introduced DeepSearch — its first agent — alongside Grok 3 Big Brain and Think reasoning modes. Trained on Colossus, the 200,000-GPU Memphis cluster (built to 100K H100s in 122 days, doubled to 200K in another 92) — roughly 10x Grok 2 compute. Healthcare ambient-AI Abridge closed a $250 million Series D at $2.75 billion. Humane informed customers the AI Pin would be discontinued — HP acquiring CosmOS platform, patents, and most staff for $116 million.
Top stories
- xAI launches Grok 3 and Grok 3 mini in live demo. Musk, Jimmy Ba, Yuhuai Wu, Igor Babuschkin pitched the start of “the age of reasoning agents.” via xAI
- Grok 3 benchmark sweep: 93.3% AIME 2025, 84.6% GPQA, 79.4% LiveCodeBench. With Big Brain / Think reasoning enabled and test-time compute (cons@64), the model beat GPT-4o, Gemini 2, and DeepSeek-V3 across math, science, and coding suites announced at launch. via xAI
- “Chocolate” tops Chatbot Arena — first model to break Elo 1402. Early Grok 3 checkpoint tested anonymously on LMArena. via Analytics Vidhya
- DeepSearch — xAI’s first agent — debuts alongside Grok 3. Research/answer agent designed to scour the web, reason about conflicting facts, and synthesize cited answers. via xAI
- Grok 3 trained on Colossus, 200K-GPU Memphis cluster. Built to 100K Nvidia H100s in 122 days and doubled to 200K H100s in another 92 days — roughly 10x Grok 2 compute (~200M H100-hours of pre-training). via Capacity
- Abridge closes $250M Series D at $2.75B. Healthcare ambient-AI startup co-led by Elad Gil and IVP. Bessemer, CapitalG, CVS Health Ventures, Lightspeed, NVentures (NVIDIA), Redpoint, and Spark joined. Crossed 100 US health-system deployments and launched a Contextual Reasoning Engine. via Fortune
- Humane AI Pin discontinued; HP buys assets for $116M. All cloud functionality ending Feb 28 at 12:00 PT; user data wiped. HP Inc. acquiring Humane’s CosmOS platform, patents, and most staff. Refunds limited to devices shipped on/after Nov 15, 2024. Deal formally announced Feb 18. via TechCrunch
Who shipped
xAI shipped Grok 3 + DeepSearch. OpenAI, Anthropic, and Google DeepMind had no dated launches.
Money, infra & hardware
Grok 3 rolled out exclusively to X Premium and Premium+ subscribers and on Grok.com; Think, Big Brain, and DeepSearch reserved for Premium+. New top-tier “SuperGrok” plan previewed; voice mode and API promised “in the coming weeks.” via TechCrunch
By the numbers
- 93.3% / 84.6% / 79.4% — Grok 3 (Think) AIME 2025 / GPQA / LiveCodeBench
- 1402 — first-ever Chatbot Arena Elo crossing (Chocolate)
- 200K / ~10x — Colossus GPU count / Grok 2 compute multiple
- $250M / $2.75B — Abridge Series D / valuation
- $116M — HP-Humane acquisition price
- Most-mentioned lab: xAI
Compiled by AI Feed’s editor from verified web sources for 17 February 2025.