Treffer: Efficient LLMs Inference via Similarity-Aware KV Cache Merging with Bias Calibration

Title:
Efficient LLMs Inference via Similarity-Aware KV Cache Merging with Bias Calibration
Source:
2025 21st International Conference on Mobility, Sensing and Networking (MSN) MSN Mobility, Sensing and Networking (MSN), 2025 21st International Conference on. :349-358 Dec, 2025
Relation:
2025 21st International Conference on Mobility, Sensing and Networking (MSN)
Database:
IEEE Xplore Digital Library