News from Together AI blog

Together AI blog Infrastructure May 29, 2026

How Together AI built the world’s fastest speech-to-text stack

Together AI built the fastest speech-to-text stack on Artificial Analysis by treating ASR as a full-path systems problem, not just a GPU inference problem.

Together AI blog Infrastructure May 19, 2026

Benchmarking inference at scale: coding agents

Real-world inference benchmarks for coding agents: 31% more TPS than TensorRT-LLM, 2× better TTFT at saturation, and 76% lower cost than Claude Opus 4.6.

Together AI blog Infrastructure May 15, 2026

Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference

Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into…

Together AI blog Infrastructure May 14, 2026

Violin: An open-source video translation skill that breaks language barriers

Violin is an open-source AI video translation tool that combines speech recognition, LLM translation, and text-to-speech to make video content accessible across languages.

Together AI blog Infrastructure May 12, 2026

Introducing voice finder — a new tool to quickly find the right voice for your app from over 600+ voices

Voice finder helps developers search, match, filter, and audition 600+ voices across Together AI TTS models using natural-language prompts or uploaded audio samples.

Together AI blog Infrastructure May 11, 2026

Serving DeepSeek-V4: why million-token context is an inference systems problem

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel…

Together AI blog Infrastructure May 8, 2026

Deploy and inference any model from HuggingFace

Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets…

Together AI blog Infrastructure May 4, 2026

Foundational research powering efficient inference at scale

As AI moves from research to production, the challenge for AI-native teams shifts from building models to running them — efficiently, reliably, and at scale.

Together AI blog Infrastructure April 30, 2026

Announcing Together AI and Adaption Partnership

Together AI and Adaption partner to bring Together Fine-Tuning natively into Adaptive Data, helping teams optimize datasets, run fine-tuning, evaluate results, and deploy stronger open models.

Together AI blog Infrastructure April 29, 2026

DeepSeek-V4 Pro now available on Together AI

DeepSeek-V4 Pro is now available on Together AI with 512K context, controllable reasoning modes, and cached-input pricing for long-context reasoning workloads like code agents, document intelligence,…

Latest