Memory Access Path Comparison

CXL.mem Direct Path vs CPU-Mediated PCIe Swap

PCIe Swap Path
~6,000 ns
1
GPU Page Fault
TLB miss → fault signal
750 ns
2
CPU Interrupt
MSI-X → context switch
2,000 ns
3
Fault Handler
Kernel page lookup
750 ns
4
DMA Setup
Descriptor allocation
500 ns
5
PCIe Transfer
4 KiB @ 32 GB/s
230 ns
6
TLB Update
Invalidate + install
750 ns
CXL.mem Direct Path
~250 ns
1
GPU Load
Shader issues load
8 ns
2
CXL Request
M2S MemRd formation
15 ns
3
PHY Transit
SerDes + switch
50 ns
4
Endpoint DRAM
DDR5 access
100 ns
5
Return Path
PHY + CXL response
65 ns
6
GPU Complete
Register writeback
12 ns
Latency Comparison (Log Scale Visual)
CXL.mem (DRAM)
250 ns Baseline
PCIe Swap (Best)
3,000 ns 12× slower
PCIe Swap (Typical)
6,000 ns 24× slower
PCIe + NVMe
25,000 ns 100× slower
âš¡

Key Insight: CPU Elimination

CXL.mem removes CPU from the critical path entirely. No interrupts, no context switches, no DMA setup. The GPU issues a load instruction and receives data directly from endpoint DRAM via hardware-managed coherence.

24×