Memory mapping, hint interfaces, and fault handling for CXL.mem
CXL.mem regions appear in the GPU's unified virtual address space via PCIe BAR mapping with CXL.mem bridging. The GPU accesses remote memory using standard load/store semantics.
Extended allocation API communicates memory characteristics to the endpoint, enabling intelligent caching and prefetch decisions.
The GPU driver translates allocation hints into endpoint firmware configuration via CXL.io mailbox commands.
When GPU accesses an uncached CXL.mem address, a fault triggers the full access path. Total latency is the sum of each stage.
| Access Type | Latency | Bandwidth | CPU Involved |
|---|---|---|---|
| HBM (Local) | ~30 ns | 8 TB/s | No |
| CXL.mem (Direct) | ~200 ns | 64 GB/s per EP | No |
| PCIe DMA (Traditional) | ~13 μs | 64 GB/s | Yes (driver) |
| NVMe Read | ~10-20 μs | 14 GB/s | Yes (filesystem) |