KV Cache Offloading for LLM Inference

Distributed endpoint architecture with intelligent caching,
attention-aware eviction, and CXL.mem acceleration

Per-Head Eviction

EMA Attention Scoring

RoPE-Aware Prefetch

Sam Pooni

San Jose, CA