APPENDIX B

NVMe Deep Dive

Complete technical exploration of NVMe architecture, protocols, and GPU integration challenges.

17
Sections
2.1MB
Content
B.1 💾
Storage Fundamentals
Block devices, file systems, and the storage stack from applications to hardware.
B.2 🔌
PCIe Topology
Understanding PCIe lanes, switches, root complexes, and peer-to-peer connectivity.
B.3
NVMe Fundamentals
NVMe protocol basics, command sets, and why it revolutionized storage performance.
B.4 📊
Queue Architecture
Submission queues, completion queues, and the lock-free design enabling parallelism.
B.5 📝
Commands & Completions
Command structure, completion entries, and the full I/O lifecycle.
B.6 🧠
Memory Addressing
PRP, SGL, and physical/virtual address translation for DMA operations.
B.7 🔔
Doorbell & Notifications
How software signals the controller and receives completion notifications.
B.8 📦
Controller Memory Buffer
CMB for placing queues on the controller, reducing latency and PCIe traffic.
B.9 🎮
GPU Challenges
Why GPUs can't directly use NVMe: memory models, atomics, and architecture gaps.
B.10 🛤️
Data Paths
Traditional vs. GPUDirect paths, bounce buffers, and data movement optimization.
B.11 🌐
RDMA Comparison
How NVMe over Fabrics compares to local NVMe for GPU workloads.
B.12 🚀
GPUDirect Storage
NVIDIA's solution for direct NVMe-to-GPU transfers, architecture and requirements.
B.13 💡
GPU-Initiated Solutions
Research approaches for true GPU-initiated I/O without CPU involvement.
B.14 🔬
Advanced Topics
ZNS, computational storage, NVMe 2.0 features, and future directions.
B.15 💻
Code Examples
Practical code samples for NVMe queue management and GPU integration.
B.16 📈
Data Paths Visual
Interactive visualization of data movement between CPU, GPU, and NVMe.
B.17 ⚙️
Protocol Internals
Deep dive into NVMe protocol details, register mappings, and bit-level structures.