© 2025 Subramaniyam (Sam) Pooni
All Rights Reserved
Proprietary & Confidential
Chapter 2

Background: LLM Inference Fundamentals

Understanding prefill vs decode, the roofline model, and why KV-cache exists.

8
Figures in Chapter
2
Inference Phases
1000×
Recompute Savings

2.1 Inference Fundamentals

Visual Appendix — Background Figures Open Full Screen ↗

2.2 Prefill vs Decode Phases

Figure 2.1-2.3 — Inference Phases View Source ↗
Section 2.1 — Inference Fundamentals Deep Dive View Source ↗

2.3 Why KV-Cache Exists

Figure 2.4 — With vs Without KV-Cache View Source ↗
Figure 2.5 — KV-Cache Growth View Source ↗

2.4 Roofline Model

Figure 2.7 — Roofline Analysis View Source ↗

2.5 Current Approaches

Figure 2.8 — Approaches Comparison Matrix View Source ↗