Treffer: Efficient LLMs Inference via Similarity-Aware KV Cache Merging with Bias Calibration
Title:
Efficient LLMs Inference via Similarity-Aware KV Cache Merging with Bias Calibration
Authors:
Source:
2025 21st International Conference on Mobility, Sensing and Networking (MSN) MSN Mobility, Sensing and Networking (MSN), 2025 21st International Conference on. :349-358 Dec, 2025
Relation:
2025 21st International Conference on Mobility, Sensing and Networking (MSN)
Database:
IEEE Xplore Digital Library