GPU & CUDA Fundamentals
Essential GPU architecture concepts, execution models, and performance optimization techniques for understanding GPU-NVMe interactions.
GPU Architecture
CPU vs GPU philosophy, SIMT execution model, Streaming Multiprocessor internals, and modern GPU specifications.
Execution Model
Thread hierarchy from grids to threads, warp formation mechanics, block-to-SM assignment, and lockstep execution.
Performance
Occupancy calculation, warp divergence costs, warp scheduling, memory coalescing patterns, and thread coarsening.
Synchronization
Why GPUs use polling instead of interrupts, timeline comparisons, polling patterns, and CPU vs GPU code differences.
Quick Reference
GPU specifications tables, memory scope reference, key formulas for occupancy and thread mapping, essential constants.
CUDA Execution Animation
Interactive visualization of block assignment to SMs, warp scheduling, and latency hiding in action.
CPU vs GPU Deep Dive
Comprehensive comparison of CPU and GPU I/O workloads, warp divergence, memory coalescing, SIMT architecture, and memory hierarchy.
Thread Synchronization Crisis
Analysis of warp efficiency, polling vs interrupts, synchronization costs, and the scale problem in GPU I/O.
CUDA Visual Guide
Interactive visualizations for occupancy, warp scheduling, thread coarsening, divergence, and memory coalescing.