smol.ai news
· Newsletters
not much happened today
**Inference optimization** is increasingly architectural, with **EAGLE 3.1** improving speculative decoding and long-context handling, collaborating with **vLLM** and **TorchSpec**. **Perplexity** open-sourced a rebuilt **Unigram tokenizer** cutting CPU use by **5–6×** and achieving **63 µs at 514 tokens**. **Qwen3.5**