APPENDIX B
NVMe Deep Dive
Complete technical exploration of NVMe architecture, protocols, and GPU integration challenges.
17
Sections
2.1MB
Content
B.1
Storage Fundamentals
Block devices, file systems, and the storage stack from applications to hardware.
→
B.2
PCIe Topology
Understanding PCIe lanes, switches, root complexes, and peer-to-peer connectivity.
→
B.3
NVMe Fundamentals
NVMe protocol basics, command sets, and why it revolutionized storage performance.
→
⭐ KEY
B.4
Queue Architecture
Submission queues, completion queues, and the lock-free design enabling parallelism.
→
B.5
Commands & Completions
Command structure, completion entries, and the full I/O lifecycle.
→
B.6
Memory Addressing
PRP, SGL, and physical/virtual address translation for DMA operations.
→
B.7
Doorbell & Notifications
How software signals the controller and receives completion notifications.
→
B.8
Controller Memory Buffer
CMB for placing queues on the controller, reducing latency and PCIe traffic.
→
⭐ KEY
B.9
GPU Challenges
Why GPUs can't directly use NVMe: memory models, atomics, and architecture gaps.
→
⭐ KEY
B.10
Data Paths
Traditional vs. GPUDirect paths, bounce buffers, and data movement optimization.
→
B.11
RDMA Comparison
How NVMe over Fabrics compares to local NVMe for GPU workloads.
→
⭐ KEY
B.12
GPUDirect Storage
NVIDIA's solution for direct NVMe-to-GPU transfers, architecture and requirements.
→
B.13
GPU-Initiated Solutions
Research approaches for true GPU-initiated I/O without CPU involvement.
→
B.14
Advanced Topics
ZNS, computational storage, NVMe 2.0 features, and future directions.
→
B.15
Code Examples
Practical code samples for NVMe queue management and GPU integration.
→
B.16
Data Paths Visual
Interactive visualization of data movement between CPU, GPU, and NVMe.
→
B.17
Protocol Internals
Deep dive into NVMe protocol details, register mappings, and bit-level structures.
→