Skip to content
X · @rasbt · X / Twitter

New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4. I focus on long-context efficiency tweaks like KV sharing…

New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4.I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC.Link: https://magazine.sebastianraschka.com/p/recent-developments-in-llm-architectures