Size formulas, growth analysis, and multi-user scaling calculations.
| Symbol | Meaning | Llama-70B |
|---|---|---|
| 2 | K and V tensors | — |
| L | Number of layers | 80 |
| nkv | KV heads per layer | 8 |
| dhead | Head dimension | 128 |
| S | Sequence length | Variable |
| sizeof | Bytes per element | 2 (BF16) |
| Context | KV-Cache Size |
|---|---|
| 4K tokens | 1.3 GB |
| 32K tokens | 10 GB |
| 128K tokens | 41 GB |
| 1M tokens | 320 GB |
For 8 users at 128K context:
This exceeds B200's 192 GB capacity by 2.5×.