How we monitor internal coding agents for misalignment
How we monitor internal coding agents for misalignment
How we monitor internal coding agents for misalignment
The Batch AI News and Insights: Should there be a Stack Overflow for AI coding agents to share their learnings with each other?
Anthropic officially told by DOD that it’s a supply chain risk, ‘cancel ChatGPT’ trend is growing after OpenAI signs a deal with the US military, and…
GPT-5.4 is now available in Windsurf with multiple reasoning effort levels. For a limited time, self serve users enjoy promotional pricing starting at 1x credits.
Reasoning models struggle to control their chains of thought, and that’s good
An action-packed episode!
OpenAI's GPT-5.3-Codex-Spark, an ultra-fast model optimized for real-time coding, is now available in Windsurf's Arena Mode Fast and Hybrid battle groups.
RT CalebWe recently shipped a Data Exploration Agent in the @huggingface Dataset Viewer 💽• powered by @OpenAI gpt-oss-120b 🤖• served by @GroqInc for super fast inference…
OpenAI to test ads in ChatGPT as it burns through billions, The Drama at Thinking Machines, STEM: Scaling Transformers with Embedding Modules
OpenAI to test ads in ChatGPT as it burns through billions, Sequoia to invest in Anthropic, Zhipu AI breaks US chip reliance, The Drama at Thinking…
The Batch AI News and Insights: How can businesses go beyond using AI for incremental efficiency gains to create transformative impact?
Open models can be used with OpenAI's Codex CLI through Ollama. Codex can read, modify, and execute code in your working directory using models such as…
GPT-5.2-Codex is now available in Windsurf with multiple reasoning effort levels. For a limited time, enjoy discounts on credit usage.
The Batch AI News and Insights: As amazing as LLMs are, improving their knowledge today involves a more piecemeal process than is widely appreciated.
GPT 5.1, GPT 5.1-Codex, and GPT-5.1-Codex Mini deliver a solid upgrade for agentic coding with variable thinking and improved steerability
Day Zero Support for OpenAI Open Safety Model
Ollama is partnering with OpenAI and ROOST (Robust Open Online Safety Tools) to bring the latest gpt-oss-safeguard reasoning models to users for safety classification tasks. gpt-oss-safeguard…
Putting the AI in Charge
Ollama partners with OpenAI to bring gpt-oss to Ollama and its community.
RT Kai-Fu LeeThe biggest revelation from Deepseek is that Open Source has won. For a 1% difference in performance, it will be difficult for OpenAI to…
o1 scores the top result on aider's new multi-language, more challenging coding benchmark.
We are proud to present the latest model ⚡️Yi-Lightning ⚡️ now #6 in the world, higher than the original GPT-4o released 5 months ago. Also humbled…
Preliminary benchmark results for the new OpenAI o1 models.