Beyond Standard LLMs
Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers
Linear Attention Hybrids, Text Diffusion, Code World Models, and Small Recursive Transformers
Multiple-Choice Benchmarks, Verifiers, Leaderboards, and LLM Judges with Code Examples
A Detailed Look at One of the Leading Open-Source LLMs
And How They Stack Up Against Qwen3
From DeepSeek-V3 to Kimi K2: A Look At Modern LLM Architecture Design
A topic-organized collection of 200+ LLM research papers from 2025
KV caches are one of the most critical techniques for efficient inference in LLMs in production.
Why build LLMs from scratch? It's probably the best and most efficient way to learn how LLMs really work. Plus, many readers have told me they…
Understanding GRPO and New Insights from Reasoning Model Papers
Welcome to the next stage of large language models (LLMs): reasoning. LLMs have transformed how we process and generate text, but their success has been largely…