chore(ci): skip uploading artifacts on stainless-internal branches
chore(ci): skip uploading artifacts on stainless-internal branches
Every primary-source story across every tracked model. Filter by clicking a chip.
chore(ci): skip uploading artifacts on stainless-internal branches
chore(test): do not count install time for mock server timeout
RT AnthropicA statement from Anthropic CEO Dario Amodei: https://www.anthropic.com/news/where-stand-department-war
Evaluating Opus 4.6 on BrowseComp, we found cases where the model recognized the test, then found and decrypted answers to it—raising questions about eval integrity in…
This post dives deep into how Claude wrote an exploit for one of the vulnerabilities it found in Firefox.
The Batch AI News and Insights: I’m thrilled to announce Context Hub, a new tool to give to your coding agents the API documentation they need…
GPT-5.4 is now available in Windsurf with multiple reasoning effort levels. For a limited time, self serve users enjoy promotional pricing starting at 1x credits.
Anthropic releases Sonnet 4.6, Google Rolls Out Gemini 3.1 Pro, Anthropic CEO Amodei says Pentagon’s threats ‘do not change our position’ on AI
Reasoning models struggle to control their chains of thought, and that’s good
Summary: Opus 4.6 can, with a simple agent scaffold, create mostly-playable but somewhat broken CLI versions of Slay the Spire and Balatro1. Intro Last weekend I…
Update bibtex Update bibtex of Qwen3-Coder-Next technical report.
you can still use all our old models on our website - everything back to midjourney v1Marcin Ignac: genAI was so much more fun when it…
The Batch AI News and Insights: We just released a Skill Builder tool to help you understand in which areas of AI you’re strong, where you…
RT AnthropicA statement from Anthropic CEO, Dario Amodei, on our discussions with the Department of War.https://www.anthropic.com/news/statement-department-of-war
Anthropic releases Sonnet 4.6, Google Rolls Out Latest AI Model Gemini 3.1 Pro, Pentagon threatens to cut off Anthropic in AI safeguards dispute
Gemini 3.1 Pro is now available in Windsurf with Low and High thinking variants. For a limited time, enjoy promotional pricing on credit usage.
Claude Sonnet 4.6 is now available in Windsurf with limited-time promotional pricing for self serve users: 2x credits without thinking and 3x credits with thinking.
Cohere Labs Launches Tiny Aya, Making Multilingual AI Accessible
An action-packed episode!
GLM-5 from Zhipu AI and Minimax M2.5 are now available in Windsurf with limited-time promotional pricing. Both models are included in Arena Mode's Frontier Arena and…
A crazy packed edition of Last Week in AI! Plus some small updates.
Ollama now supports subagents and web search in Claude Code.
Carlsen will bring his iconic reputation and strategic thinking to strengthen the company's brand and mission
Most of METR’s time horizon measurements are done using two scaffolds: Triframe and ReAct1. People sometimes see that we use these two scaffolds and feel skeptical…
We track 28 AI models across text, image, video, and audio domains. Each model has its own filtered news feed — click a chip above to see only that model's primary-source coverage. Tagging is automatic at ingest time using a strict title-only keyword match (so a benchmark post that mentions five models in its summary only shows up under whichever model is named in the headline — no thin-content drift).
Text / LLMs: GPT, Claude, Gemini, Gemma, Llama, Mistral, Grok, Qwen, DeepSeek, Kimi, GLM, MiniMax, Yi, Hunyuan, Command, Phi.
Image generation: FLUX, Stable Diffusion, Midjourney, Imagen.
Video generation: Sora, Veo, Runway, Luma, Kling, Pika.
Audio / voice: ElevenLabs, Suno.