© 2025 Subramaniyam (Sam) Pooni
All Rights Reserved
Proprietary & Confidential
Chapter 1

Introduction: The Memory Wall Problem

Why KV-cache is the critical bottleneck in LLM inference and what it costs the industry.

5
Figures in Chapter
41 GB
KV-Cache @ 128K
320 KB
Per Token

1.1 The Memory Wall

Modern LLMs face a fundamental constraint: the KV-cache grows linearly with context length, rapidly exhausting GPU memory.

Visual Appendix — Introduction Figures Open Full Screen ↗

1.2 Distributed Endpoint Solution

Figure 1.4-1.5 — Distributed Endpoint Architecture Open Full Screen ↗

1.3 KV-Cache Mathematics

Interactive KV-Cache Size Calculator View Source ↗