© 2025 Subramaniyam (Sam) Pooni
All Rights Reserved
Proprietary & Confidential
Appendix C

KV-Cache Structure and Mathematics

Size formulas, growth analysis, and multi-user scaling calculations.

C.1 KV-Cache Size Formula

KV_size = 2 × L × nkv × dhead × S × sizeof(dtype)
SymbolMeaningLlama-70B
2K and V tensors
LNumber of layers80
nkvKV heads per layer8
dheadHead dimension128
SSequence lengthVariable
sizeofBytes per element2 (BF16)

C.2 Per-Token Size

Per_token = 2 × 80 × 8 × 128 × 2 = 327,680 bytes = 320 KB

C.3 Scaling by Context Length

ContextKV-Cache Size
4K tokens1.3 GB
32K tokens10 GB
128K tokens41 GB
1M tokens320 GB

C.4 Multi-User Memory

Total = Model_Weights + (Users × KV_per_user) + Activations

For 8 users at 128K context:

Total = 140 GB + (8 × 41 GB) + 5 GB = 473 GB

This exceeds B200's 192 GB capacity by 2.5×.