CXL 3.0 UCIe UEC
Technical Reference

KV Cache Offloading for LLM Inference

Distributed endpoint architecture with intelligent caching,
attention-aware eviction, and CXL.mem acceleration

Per-Head Eviction
EMA Attention Scoring
RoPE-Aware Prefetch
Sam Pooni
San Jose, CA