Commercial products, software frameworks, and recent research
π’
Commercial Products
Available today
CMM-D / CMM-B
Samsung
CXL 2.0
16 TB pools60 GB/s~600ns latency
CXL memory expander with large capacity DRAM pools.
Dumb DRAM Γ’β¬β no compute, no intelligent caching
Apollo + GISMO
XConn + MemVerge
CXL 3.0
100 TiB poolsNVIDIA Dynamo
CXL memory pooling integrated with NVIDIA's inference stack.
Memory pooling only Γ’β¬β no attention-aware eviction
Niagara
Astera Labs
Type-3
CXL ExpanderAcademic Research
CXL Type-3 memory expander used in university research settings.
Expander only Γ’β¬β no processing capability
vLLM / LMCache
Open Source
Framework
PagedAttentionCPU/SSD offload
Industry-standard inference framework with memory management.
Coarse-grained, not CXL-optimized
π
Recent Research
OctΓ’β¬βDec 2025 (arXiv)
arXiv 2025
PNM-KV
CXL-enabled processing-near-memory that offloads token page selection to a PNM accelerator.
21.9Γβ throughput (1M tokens)
arXiv 2025
CXL-SpecKV
FPGA-based speculative KV-cache prefetching with compression.
4Γ’β¬β8Γβ memory expansion
arXiv 2025
TraCT
CXL shared memory as rack-scale KV cache with direct GPU load/store and DMA.
Rack-scale KV sharing
π―
The Gap: Nobody Has Combined
Missing pieces for truly intelligent KV-cache management
π’
Per-KV-Head Tracking
Respecting GQA's 640 queues
π
EMA Attention Scoring
Smoothed eviction priority
π§
RoPE-Aware Prefetch
Position locality exploitation
π§
Controller-Resident Intelligence
Logic in the CXL endpoint
π‘Closest: PNM-KV Γ’β¬β but they do token selection, not per-head eviction with attention weighting.
The opportunity is more fine-grained and model-architecture-aware.
White SpaceCXL Controller-Resident Intelligence
Γ’Εβ
Per-KV-Head Eviction
Track 640 GQA queues independently, evict at head granularity
Γ’Εβ
EMA-Based Scoring
Smooth attention scores over time, prevent thrashing
Γ’Εβ
RoPE Locality Prefetch
Exploit position encoding structure for predictive fetch
Γ’Εβ
Model-Architecture Aware
Understands transformer structure, not just memory pages