L_dram
DRAM access latency via CXL.mem path
~150–300 ns
L_flash
Flash access latency (NVMe behind endpoint controller)
~10–20 μs
L_recompute
Cost of regenerating evicted KV entries
~50–200 ms
Comparison to PCIe Baseline
PCIe swap path latency components:
🚨 Page fault
🔧 Driver intervention
📋 DMA setup
📡 PCIe transfer
🔓 Completion interrupt
PCIe Path
~13 μs
fault + driver + DMA + transfer + IRQ
CXL.mem Direct
~250 ns
load/store memory semantics
65× Lower Latency
Under Load, PCIe Latency Degrades
Interrupt Coalescing
+10–50 μs
OS Scheduling
+100 μs jitter
Substituting Values
L_eff = (0.85 × 250ns) + (0.14 × 15μs) + (0.01 × 100ms)
L_eff = 170ns + 2,100ns + 1,000,000ns
L_eff = 1,002,270 ns
Effective Latency
≈ 1.0 ms
💡
Recompute dominates. Even at just 1% miss rate, recompute contributes 99.8% of total latency.
This is why intelligent caching that prevents misses is critical.
Intelligent Caching Prevents Thrashing
The endpoint's intelligent caching prevents thrashing—high-value entries never evict,
eliminating the miss cascades that cause queue buildup. EMA-based scoring ensures frequently-accessed
KV entries remain in fast DRAM tier while cold entries gracefully demote to flash.