ACSCtl shows - for all = ACS disabled (good for P2P). If you see +, P2P won't work.
Disable ACS (If Needed)
# Add to kernel command line (GRUB)
pci=noacs
# Or disable at runtime for specific device
setpci -s 0000:3a:00.0 ECAP_ACS+6.w=0000
💡 Production Note
In virtualized environments (VMs, containers with device passthrough), you may need ACS for security. This creates a fundamental tension: security vs. performance. Some organizations use dedicated bare-metal nodes for GPU training to avoid this tradeoff.
Discovering Your Topology
nvidia-smi topo
$ nvidia-smi topo -m GPU0 GPU1 GPU2 GPU3 NVMe0 NVMe1 CPU
GPU0 X NV12 SYS SYS PHB NODE SYS
GPU1 NV12 X SYS SYS NODE PHB SYS
GPU2 SYS SYS X NV12 SYS SYS PHB
GPU3 SYS SYS NV12 X SYS SYS PHB
PHB
PCIe Host Bridge — same CPU socket, different root port. P2P possible.
NODE
Same NUMA node. May involve PCIe switch. Check with lspci.
SYS
Cross-socket via QPI/UPI. P2P unlikely to work efficiently.
GPU and NVMe under same switch [01-02] = P2P should work (if ACS disabled).
NUMA and Storage Affinity
In multi-socket systems, each CPU has its own PCIe lanes. Accessing storage on the "wrong" NUMA node adds latency.
✓ Local Access
GPU → Local PCIe → NVMe
~2-3 μs latency
✗ Remote Access
GPU → QPI/UPI → Remote PCIe → NVMe
~5-8 μs latency (+100-200%)
# Check NUMA node for a device
$ cat /sys/bus/pci/devices/0000:3a:00.0/numa_node
0
# Check which GPUs are on which NUMA node
$ nvidia-smi topo -m | head -1
💡 Best Practice
For GPUDirect Storage, ensure GPU and NVMe are on the same NUMA node AND under the same PCIe switch. This is a hardware/BIOS configuration decision — plan it before deployment.