Skip to content
arXiv cs.CL · Papers

ReFreeKV: Towards Threshold-Free KV Cache Compression

arXiv:2502.16886v4 Announce Type: replace Abstract: To reduce memory consumption during LLM inference, a handful of methods have been proposed for KV cache pruning. While these techniques can accomplish lossless memory reduction on many datasets, they often hinge on an under-emphasized condition: an input/domain-specif