Treffer: Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits
Title:
Scalable Processing-Near-Memory for 1M-Token LLM Inference: CXL-Enabled KV-Cache Management Beyond GPU Limits
Authors:
Source:
2025 34th International Conference on Parallel Architectures and Compilation Techniques (PACT) PACT Parallel Architectures and Compilation Techniques (PACT), 2025 34th International Conference on. :1-13 Nov, 2025
Relation:
2025 34th International Conference on Parallel Architectures and Compilation Techniques (PACT)
Database:
IEEE Xplore Digital Library