X · @rasbt
· X / Twitter
New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4. I focus on long-context efficiency tweaks like KV sharing…
New article: a visual tour of recent LLM architecture advances, from Gemma 4 to DeepSeek V4.I focus on long-context efficiency tweaks like KV sharing, per-layer embeddings, layer-wise attention budgets, compressed attention, and mHC.Link: https://magazine.sebastianraschka.com/p/recent-developments-in-llm-architectures