Skip to content
smol.ai news · Newsletters

not much happened today

**Inference optimization** is increasingly architectural, with **EAGLE 3.1** improving speculative decoding and long-context handling, collaborating with **vLLM** and **TorchSpec**. **Perplexity** open-sourced a rebuilt **Unigram tokenizer** cutting CPU use by **5–6×** and achieving **63 µs at 514 tokens**. **Qwen3.5**