Source February 24, 2025 · Daily Brief

AI Daily Brief — 24 February 2025

Monday delivered Anthropic’s biggest model release since 3.5 Sonnet plus the year’s most consequential open-source infrastructure drop. Claude 3.7 Sonnet shipped as the industry’s first hybrid reasoning model: one model with near-instant responses or visible step-by-step extended thinking, with API users setting a thinking-token budget up to 128K to trade speed/cost for quality. Available across all Claude tiers, the Anthropic API, Amazon Bedrock, and Google Cloud Vertex AI. Same pricing as 3.5 Sonnet — $3 input / $15 output per million tokens (including thinking tokens). 70.3% SWE-bench Verified, ahead of OpenAI o1 and DeepSeek R1 at launch. 81.2% TAU-bench retail. 84.8% GPQA Diamond in extended thinking mode, 96.5% physics subscore. 45% fewer unnecessary refusals than its predecessor. Alongside the model, Anthropic launched Claude Code as a limited research preview — a CLI agentic terminal tool that delegates substantial engineering tasks (searching/editing code, running tests, committing) to Claude. DeepSeek opened its five-day Open Source Week with FlashMLA — a Hopper-optimized MLA decoding kernel for variable-length sequences, hitting up to 3,000 GB/s in memory-bound and 580 TFLOPS in compute-bound workloads on H800 SXM5. xAI rolled out Grok 3 voice mode broadly to Premium+ and SuperGrok subscribers.

Who shipped

The deepest single-day product shipping of the year so far. Anthropic shipped Claude 3.7 Sonnet + Claude Code + system card. DeepSeek shipped FlashMLA. xAI shipped a voice-mode tier expansion. OpenAI, Google DeepMind, Meta, and Mistral made no dated launches.

Open-source pulse

FlashMLA opens a five-day batched release week from DeepSeek. The Hopper-specific MLA kernel directly attacks one of the binding bottlenecks for production R1 / V3 serving: variable-length-sequence decoding throughput. Perplexity’s R1-1776, Stepfun’s Step-Video-T2V, and Nous DeepHermes-3 from the prior week continued spreading.

Money, infra & hardware

Claude 3.7 Sonnet’s launch with the same $3/$15 pricing as 3.5 Sonnet — including thinking tokens billed as output — set the pricing frame for hybrid reasoning. DeepSeek’s FlashMLA performance numbers (~83% of theoretical bandwidth) directly informed cost-of-inference debates for the rest of the week.

By the numbers

70.3% — Claude 3.7 Sonnet SWE-bench Verified (state of the art at launch)
200K / 64K / 128K — context / default output / extended-thinking budget ceiling
$3 / $15 per million tokens — input / output (same as 3.5 Sonnet)
45% — drop in unnecessary refusals vs predecessor
3,000 GB/s / 580 TFLOPS — FlashMLA H800 SXM5 memory-bound / compute-bound
Most-mentioned lab: Anthropic

Compiled by AI Feed’s editor from verified web sources for 24 February 2025.

Top stories

Who shipped

Open-source pulse

Money, infra & hardware

By the numbers