APPENDIX A

GPU & CUDA Fundamentals

Essential GPU architecture concepts, execution models, and performance optimization techniques for understanding GPU-NVMe interactions.

5
Sections
504KB
Total Size
A.1 🏛️

GPU Architecture

CPU vs GPU philosophy, SIMT execution model, Streaming Multiprocessor internals, and modern GPU specifications.

SIMT SM Architecture H100 / H200 MI300X
A.2 🧬

Execution Model

Thread hierarchy from grids to threads, warp formation mechanics, block-to-SM assignment, and lockstep execution.

Thread Hierarchy Warps Block Assignment SIMT Execution
A.3

Performance

Occupancy calculation, warp divergence costs, warp scheduling, memory coalescing patterns, and thread coarsening.

Occupancy Divergence Coalescing Memory Hierarchy
A.4 🔄

Synchronization

Why GPUs use polling instead of interrupts, timeline comparisons, polling patterns, and CPU vs GPU code differences.

Interrupts vs Polling Timeline Code Patterns
A.5 📋

Quick Reference

GPU specifications tables, memory scope reference, key formulas for occupancy and thread mapping, essential constants.

GPU Specs Memory Scope Formulas Constants
A.6 🎬

CUDA Execution Animation

Interactive visualization of block assignment to SMs, warp scheduling, and latency hiding in action.

Animation Block Queue Warp Scheduler Latency Hiding
A.7 ⚔️

CPU vs GPU Deep Dive

Comprehensive comparison of CPU and GPU I/O workloads, warp divergence, memory coalescing, SIMT architecture, and memory hierarchy.

Warp Divergence Memory Hierarchy SIMT Coalescing
A.8 🔄

Thread Synchronization Crisis

Analysis of warp efficiency, polling vs interrupts, synchronization costs, and the scale problem in GPU I/O.

Warp Efficiency Polling vs Interrupts Sync Cost Scale Problem
A.9 📊

CUDA Visual Guide

Interactive visualizations for occupancy, warp scheduling, thread coarsening, divergence, and memory coalescing.

Occupancy Warp Scheduling Memory Coalescing Divergence