Infrastructure news

Cursor Infrastructure June 5, 2026

Direct agents with visual prompts in Design Mode

Point, draw, or narrate UI changes in the browser while agents edit the code underneath.

Pinecone Infrastructure June 5, 2026

Nexus in the Wild: Real Results from Our Early Access Customers

X · @llama_index Infrastructure June 4, 2026

Most AI pipelines are only as good as the data we provide them with, and that usually means PDFs or other unstructured documents. Contracts, invoices,…

Most AI pipelines are only as good as the data we provide them with, and that usually means PDFs or other unstructured documents.Contracts, invoices, reports... All…

Ollama (via openrss) Infrastructure June 4, 2026

NVIDIA Nemotron 3 Ultra

NVIDIA Nemotron 3 Ultra is built for high-throughput reasoning and long-running agent workflows.

NVIDIA Nemotron Infrastructure June 3, 2026

NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI

At CVPR, NVIDIA is unveiling new physical AI agent skills that help researchers and developers speed the development of autonomous vehicles, robots and vision AI systems.…

Weaviate Infrastructure June 3, 2026

Engram is now Generally Available

Engram, Weaviate's managed memory and context service for agentic applications, is now generally available.

NVIDIA Nemotron Infrastructure June 2, 2026

Why Financial Institutions Are Converging on Transaction Foundation Models to Build Their Own Intelligence

Financial institutions have spent years building AI: fraud models, credit models, recommendation engines and risk systems. While this sprawl of task-specific models has been effective, it’s…

Pinecone Infrastructure June 2, 2026

Inside AskData: How We Slashed Token Consumption by Over 90%

Together AI blog Infrastructure May 29, 2026

How Together AI built the world’s fastest speech-to-text stack

Together AI built the fastest speech-to-text stack on Artificial Analysis by treating ASR as a full-path systems problem, not just a GPU inference problem.

Ollama (via openrss) Infrastructure May 28, 2026

OpenJarvis: a local-first personal AI is now available to run with Ollama

OpenJarvis v1.0 is now available: an open-source framework for building personal AI agents that run on your own hardware, with Ollama support built-in.

Weaviate Infrastructure May 28, 2026

Leveling up Weaviate Cloud security: Expanding role-based access control for Cloud console

Weaviate Cloud now supports more granular role-based access control with new Editor and Viewer roles for improved security and organizational management.

Pinecone Infrastructure May 27, 2026

Turn Azure Data into an AI-Ready Knowledge Base

Weaviate Infrastructure May 21, 2026

Build a Coding Assistant with Weaviate MCP: RAG over Code & Docs

Use Weaviate's built-in MCP server to give Claude Code, Cursor, and VS Code hybrid search over your codebase and docs. No glue code.

Replicate Infrastructure May 21, 2026

How to prompt Grok Imagine Video 1.5

Grok Imagine Video 1.5 is the most exciting video model release from xAI. You can generate realistic video with synchronized audio in a single pass, capable…

NVIDIA Nemotron Infrastructure May 19, 2026

NVIDIA and Google Cloud Empower the Next Wave of AI Builders

At this year’s Google I/O conference, NVIDIA and Google Cloud are accelerating the work of more than 100,000 developers in the companies’ joint developer community, which…

Together AI blog Infrastructure May 19, 2026

Benchmarking inference at scale: coding agents

Real-world inference benchmarks for coding agents: 31% more TPS than TensorRT-LLM, 2× better TTFT at saturation, and 76% lower cost than Claude Opus 4.6.

NVIDIA Nemotron Infrastructure May 18, 2026

NVIDIA CEO Jensen Huang at Dell Technologies World: ‘Demand Is Going Parabolic, Utterly Parabolic’

Agentic AI inference at one-tenth the cost per token with NVIDIA Vera Rubin NVL72. Agent sandboxes run 50% faster on NVIDIA Vera than traditional CPUs —…

Cursor Infrastructure May 18, 2026

Introducing Composer 2.5

A substantial improvement in intelligence and behavior over Composer 2, particularly on long-horizon agentic tasks.

Together AI blog Infrastructure May 15, 2026

Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference

Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into…

Together AI blog Infrastructure May 14, 2026

Violin: An open-source video translation skill that breaks language barriers

Violin is an open-source AI video translation tool that combines speech recognition, LLM translation, and text-to-speech to make video content accessible across languages.

Weaviate Infrastructure May 14, 2026

Text Analysis for Hybrid Search: Tokenization, Stopwords & Accent Folding

Tokenization makes or breaks hybrid search. See how Weaviate's accent folding, custom stopwords, and /v1/tokenize endpoint power multilingual BM25.

Windsurf (Codeium) Infrastructure May 12, 2026

Opus 4.7 (fast mode) is now available in Windsurf

Claude Opus 4.7 (fast mode) is now available in Windsurf with the full intelligence of Opus 4.7 and ~2.5x higher output speeds.

Together AI blog Infrastructure May 12, 2026

Introducing voice finder — a new tool to quickly find the right voice for your app from over 600+ voices

Voice finder helps developers search, match, filter, and audition 600+ voices across Together AI TTS models using natural-language prompts or uploaded audio samples.

Together AI blog Infrastructure May 11, 2026

Serving DeepSeek-V4: why million-token context is an inference systems problem

DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel…

Infrastructure 333 stories