© 2025 Subramaniyam (Sam) Pooni
All Rights Reserved
Proprietary & Confidential
Appendix I

Bandwidth and Latency Calculations

Decode bandwidth requirements and prefetch window sizing.

I.1 Decode Bandwidth Requirement

For Llama-70B generating 20 tokens/second:

BW = (Model + KV) / time_per_token = (140 + 41) GB / 50 ms = 3.62 TB/s

I.2 Per-Layer Transfer

ComponentSize
Attention weights536 MB
FFN weights1.2 GB
KV-cache slice512 MB
Total per layer2.27 GB

I.3 Prefetch Window Sizing

With 2-layer prefetch depth and 625 μs per layer:

Prefetch_time = 2 × 625 μs = 1.25 ms
Prefetchable_data = 1.25 ms × 256 GB/s = 320 MB

This covers ~0.14× one layer—enough to hide most CXL latency through overlap.

I.4 Endpoint Count Trade-offs

EndpointsBWCapacityCost
2128 GB/s512 GB$2,500
4256 GB/s1 TB$5,000
8512 GB/s2 TB$10,000

4 endpoints is optimal for cost/performance balance.