arXiv cs.CL
· Papers
ReFreeKV: Towards Threshold-Free KV Cache Compression
arXiv:2502.16886v4 Announce Type: replace Abstract: To reduce memory consumption during LLM inference, a handful of methods have been proposed for KV cache pruning. While these techniques can accomplish lossless memory reduction on many datasets, they often hinge on an under-emphasized condition: an input/domain-specif