What exists today vs. what my research addresses
PNM-KV achieves impressive 21.9× throughput with processing-near-memory, but operates at token granularity—not per-head eviction with attention weighting. Our approach is more fine-grained and model-architecture-aware, with intelligence residing in the CXL controller itself.