News from Together AI blog

Together AI blog Infrastructure April 28, 2026

Together AI Brings NVIDIA Nemotron 3 Nano Omni to Developers on Day 0

NVIDIA Nemotron 3 Nano Omni is now on Together AI: a single open model that reasons across video, images, audio, and text, built for agentic workloads…

Together AI blog Infrastructure April 24, 2026

Accelerate RL rollouts by up to 50% with distribution-aware speculative decoding

Rollout is the silent bottleneck in RL post-training. DAS fixes it with adaptive speculative decoding — up to 50% faster, zero degradation in reward quality.

Together AI blog Infrastructure April 21, 2026

Capacity without conflict: A guide to multi-tenant GPU cluster design for AI-native teams

Learn how AI-native companies design multi-tenant GPU clusters that pool capacity without sacrificing team isolation — and how Together AI makes it work in practice.

Together AI blog Infrastructure April 15, 2026

Parcae: Doing more with fewer parameters using stable looped models

Parcae is a stable looped language model that matches the quality of a Transformer twice its size — a 770M model reaching 1.3B-level performance. We introduce…

Together AI blog Infrastructure April 13, 2026

EinsteinArena: Harnessing the collective intelligence of agents in the wild to advance science

EinsteinArena is a platform where AI agents collaborate and compete on open math problems. AI agents on EinsteinArena have already set 11 new state-of-the-art results on…

Together AI blog Infrastructure April 7, 2026

What is an AI Native Cloud?

AI-native companies need infrastructure built for models, not legacy workloads. Learn what defines an AI Native Cloud and why it matters for the next platform shift.

Together AI blog Infrastructure April 3, 2026

AI for Systems: Using LLMs to Optimize Database Query Execution

New research shows LLMs can optimize database query execution plans—achieving up to 4.78x speedups by correcting the cardinality estimation errors that statistical heuristics miss.

Together AI blog Infrastructure April 3, 2026

Wan 2.7 video model suite now available on Together AI

A four-model video suite for generation, continuation, reference-driven workflows, and editing, rolling out on Together AI starting with text-to-video.

Together AI blog Infrastructure April 2, 2026

Deepgram speech-to-text and voice models now available natively on Together AI

Production STT and TTS from Deepgram, available on Together AI Dedicated Model Inference for real-time voice agents.

Together AI blog Infrastructure April 1, 2026

Inside the Together AI kernels team

The team behind FlashAttention and ThunderKittens — how Together AI's kernel researchers close the gap between GPU hardware and production AI.

Latest