Skip to content
arXiv cs.AI · Papers

MATCH: Modulating Attention via In-Context Retrieval for Long-Context Transformers

arXiv:2606.29844v1 Announce Type: cross Abstract: The quadratic computational cost of traditional attention mechanisms poses a major bottleneck to the scalability and practical deployment of large language models (LLMs), particularly in long-context scenarios. To improve efficiency, existing approaches often enforce ri