1.75 GB ÷ 192 GB/s = 9.1 msBorderline → works if compute is slow
✓5 Endpoints (320 GB/s)
1.75 GB ÷ 320 GB/s = 5.5 msPrefetch completes before compute → no stall
🦙
Llama-70B Example
140 GB FP16 weights total
Endpoint DRAM
All Model Weights
140 GB
All layers resident → no prefetch needed
Endpoint Flash
KV-Cache Overflow
1+ TB
Long context / multi-session storage
✓
With sufficient endpoint DRAM, all layers stay resident.
No prefetch latency, no layer swapping. Flash handles KV-cache overflow for 128K+ context windows and multi-tenant session storage.