PyTorch to Silicon

A choreographed journey through the complete GPU architecture stack — from high-level PyTorch operations down to tensor cores and matrix units

25
Visualizations
5
Stack Layers
NVIDIA + AMD
Coverage

The Data Flow Journey

Watch tensors flow from Python through CUDA to silicon

🐍 PyTorch model.forward() Operators cuBLAS / cuDNN 📡 Communication NCCL / FSDP 🔧 Hardware GPU / Interconnect 💎 Silicon Tensor Cores

🔬 Deep Dive: NVIDIA & AMD Tensor Architecture

18 additional visualizations covering tensor cores, matrix units, TMEM, AGPRs, and more

Open Tensor Core Library