Appendix A: GPU & CUDA Fundamentals

GPU Architecture

CPU vs GPU philosophy, SIMT execution model, Streaming Multiprocessor internals, and modern GPU specifications.

Thread hierarchy from grids to threads, warp formation mechanics, block-to-SM assignment, and lockstep execution.

Occupancy calculation, warp divergence costs, warp scheduling, memory coalescing patterns, and thread coarsening.

Why GPUs use polling instead of interrupts, timeline comparisons, polling patterns, and CPU vs GPU code differences.

GPU specifications tables, memory scope reference, key formulas for occupancy and thread mapping, essential constants.

Interactive visualization of block assignment to SMs, warp scheduling, and latency hiding in action.

Comprehensive comparison of CPU and GPU I/O workloads, warp divergence, memory coalescing, SIMT architecture, and memory hierarchy.

Analysis of warp efficiency, polling vs interrupts, synchronization costs, and the scale problem in GPU I/O.

Interactive visualizations for occupancy, warp scheduling, thread coarsening, divergence, and memory coalescing.