Recommended: Hybrid Approach
GPU
EMA updates
→
CXL Controller
Eviction + Prefetch
→
NVMe
Cold storage
GPU computes attention → streams scores to CXL controller → controller updates EMA, makes eviction/prefetch decisions → issues async NVMe reads