Memory Access Path Comparison

CXL.mem Direct Path vs CPU-Mediated PCIe Swap

PCIe Swap Path

~6,000 ns

GPU Page Fault

TLB miss â†’ fault signal

750 ns

CPU Interrupt

MSI-X â†’ context switch

2,000 ns

Fault Handler

Kernel page lookup

750 ns

DMA Setup

Descriptor allocation

500 ns

PCIe Transfer

4 KiB @ 32 GB/s

230 ns

TLB Update

Invalidate + install

750 ns

CXL.mem Direct Path

~250 ns

GPU Load

Shader issues load

8 ns

CXL Request

M2S MemRd formation

15 ns

PHY Transit

SerDes + switch

50 ns

Endpoint DRAM

DDR5 access

100 ns

Return Path

PHY + CXL response

65 ns

GPU Complete

12 ns

Latency Comparison (Log Scale Visual)

CXL.mem (DRAM)

250 ns Baseline

PCIe Swap (Best)

3,000 ns 12Ã— slower

PCIe Swap (Typical)

6,000 ns 24Ã— slower

PCIe + NVMe

25,000 ns 100Ã— slower

âš¡

Key Insight: CPU Elimination

CXL.mem removes CPU from the critical path entirely. No interrupts, no context switches, no DMA setup. The GPU issues a load instruction and receives data directly from endpoint DRAM via hardware-managed coherence.

24Ã—