CXL + Processing-Near-Memory
1M tokens
21.9× throughput
Offloads token selection to PNM accelerator in CXL. Steady-token mechanism.
Gap: Custom silicon, no per-head EMA tracking
FPGA + CXL
Speculative prefetch
Compression
Predicts future token accesses, prefetches speculatively. FPGA-based.
Gap: Speculative (not attention-aware), FPGA prototype only
CXL shared memory
Rack-scale
GPU-CXL DMA
Uses CXL as KV transfer substrate, bypasses NIC. Prefix-aware caching.
Gap: Focus on sharing, not intelligent eviction