Appendix B: NVMe Deep Dive | GPU-NVMe Architecture

B.1 💾

Storage Fundamentals

Block devices, file systems, and the storage stack from applications to hardware.

→

B.2 🔌

PCIe Topology

Understanding PCIe lanes, switches, root complexes, and peer-to-peer connectivity.

→

B.3 ⚡

NVMe Fundamentals

NVMe protocol basics, command sets, and why it revolutionized storage performance.

→

⭐ KEY

B.4 📊

Queue Architecture

Submission queues, completion queues, and the lock-free design enabling parallelism.

→

B.5 📝

Commands & Completions

Command structure, completion entries, and the full I/O lifecycle.

→

B.6 🧠

Memory Addressing

PRP, SGL, and physical/virtual address translation for DMA operations.

→

B.7 🔔

Doorbell & Notifications

How software signals the controller and receives completion notifications.

→

B.8 📦

Controller Memory Buffer

CMB for placing queues on the controller, reducing latency and PCIe traffic.

→

⭐ KEY

B.9 🎮

GPU Challenges

Why GPUs can't directly use NVMe: memory models, atomics, and architecture gaps.

→

⭐ KEY

B.10 🛤️

Data Paths

Traditional vs. GPUDirect paths, bounce buffers, and data movement optimization.

→

B.11 🌐

RDMA Comparison

How NVMe over Fabrics compares to local NVMe for GPU workloads.

→

⭐ KEY

B.12 🚀

GPUDirect Storage

NVIDIA's solution for direct NVMe-to-GPU transfers, architecture and requirements.

→

B.13 💡

GPU-Initiated Solutions

Research approaches for true GPU-initiated I/O without CPU involvement.

→

B.14 🔬

Advanced Topics

ZNS, computational storage, NVMe 2.0 features, and future directions.

→

B.15 💻

Code Examples

Practical code samples for NVMe queue management and GPU integration.

→

B.16 📈

Data Paths Visual

Interactive visualization of data movement between CPU, GPU, and NVMe.

→

B.17 ⚙️

Protocol Internals

Deep dive into NVMe protocol details, register mappings, and bit-level structures.

→