AI Daily Brief — 18 February 2025
Tuesday delivered a stealth-out from a former OpenAI CTO and the year’s most provocative open-source post-train. Mira Murati publicly unveiled Thinking Machines Lab — herself as CEO, Barret Zoph as CTO, John Schulman as chief scientist, ~29 employees recruited from OpenAI, Character AI, Google DeepMind, Meta, and Mistral. Mission: build multimodal AI systems that work collaboratively with people, with a culture of open science. Perplexity AI shipped R1-1776 on Hugging Face under Apache 2.0 — DeepSeek-R1 post-trained on a multilingual dataset of ~40,000 prompts spanning ~300 censored topics, reasoning benchmarks held on par with the base model. Grok 3’s day-two cycle turned scrutinizing: researchers flagged benchmark-chart presentation (Grok cons@64 vs single-shot rival scores), and the chatbot itself publicly praised Sam Altman and criticised Musk. Stepfun’s Step-Video-T2V (30B, MIT, 204 frames) and Nous Research’s DeepHermes-3 (toggleable reasoning, 8B Llama 3.1) spread on Hugging Face.
Top stories
- Mira Murati launches Thinking Machines Lab out of stealth. Former OpenAI CTO as CEO; Barret Zoph (ex-OpenAI) as CTO; John Schulman as chief scientist. ~29 employees recruited from OpenAI, Character AI, Google DeepMind, Meta, and Mistral. Mission: multimodal AI systems working collaboratively with people; culture of open science. via TechCrunch
- Perplexity releases R1-1776 open-source reasoning model (Apache 2.0). Post-trained variant of DeepSeek-R1 stripped of CCP-style content filters. Built by identifying ~300 censored topics and training on a multilingual dataset of ~40,000 prompts; reasoning benchmarks remain on par with base DeepSeek-R1. via Hugging Face
- Grok 3 day-two: benchmark-chart scrutiny. AI researchers flagged that xAI’s chart showed Grok 3 with a second, lighter shade representing “consensus@64” (best of 64 attempts) while comparing to single-shot scores of GPT-4o, Claude 3.5 Sonnet, and DeepSeek-V3 — a non-apples-to-apples presentation that drew criticism on X. via Esade
- Grok 3 publicly contradicts Musk, sides with Altman. Hours after launch, users surfaced exchanges in which Grok 3 itself praised Altman and criticised Musk’s behaviour, fueling debate about the “maximum truth-seeking AI” framing. Coverage framed it as Grok “betraying” its creator. via Yahoo News
- Stepfun ships Step-Video-T2V — 30B, MIT, 204 frames. 16×16 spatial + 8× temporal compression VAE; bilingual (Chinese/English) prompts; DPO applied in final stage. Inference code and weights on Hugging Face / GitHub. Turbo variant alongside. via GitHub
- Nous Research DeepHermes-3 — first toggle-on reasoning model. 8B Llama-3.1-based with toggleable
<think>tags — switch between fast intuitive answers and longer chain-of-thought. ~67% on MATH at 8B. via VentureBeat
Who shipped
Perplexity, Stepfun, and Nous shipped open weights. Mira Murati‘s new lab announced its existence. OpenAI, Anthropic, Google DeepMind, Meta, and Mistral made no dated launches.
Open-source pulse
Three Apache/MIT-class releases in 24 hours (R1-1776, Step-Video-T2V, DeepHermes-3) plus the Hugging Face Open-R1 OpenR1-Math refresh signalled the strongest open-source 24 hours of the month. The DeepSeek “ecosystem of derivatives” thesis from late January now had its first big Western-lab implementations.
Money, infra & hardware
Mistral’s Le Chat (Feb 6 mobile launch) was on track to cross 1 million downloads — officially announced Feb 19 at 13 days. The app had briefly topped the French App Store free-downloads chart after Macron’s “download Le Chat instead of ChatGPT” call. via TechCrunch
Quiet corners
The unresolved Musk-Altman $97.4B saga kept circulating. Microsoft Research’s Muse / WHAM (World and Human Action Model), trained on ~7 years of Bleeding Edge gameplay with Ninja Theory, was set for Nature publication and Xbox/Azure AI Foundry open-sourcing announced the next day. via Microsoft Research
By the numbers
- ~29 Thinking Machines Lab launch headcount
- ~40K / ~300 — R1-1776 retraining prompts / censored topics
- 30B / 204 — Step-Video-T2V parameters / max generated frames
- 8B / ~67% — DeepHermes-3 size / MATH score
- Most-mentioned company: Perplexity
Compiled by AI Feed’s editor from verified web sources for 18 February 2025.