Appendix I

Bandwidth and Latency Calculations

Decode bandwidth requirements and prefetch window sizing.

I.1 Decode Bandwidth Requirement

For Llama-70B generating 20 tokens/second:

BW = (Model + KV) / time_per_token = (140 + 41) GB / 50 ms = 3.62 TB/s

With 2-layer prefetch depth and 625 μs per layer:

Prefetch_time = 2 × 625 μs = 1.25 ms

Prefetchable_data = 1.25 ms × 256 GB/s = 320 MB

This covers ~0.14× one layer—enough to hide most CXL latency through overlap.

Endpoints	BW	Capacity	Cost
2	128 GB/s	512 GB	$2,500
4	256 GB/s	1 TB	$5,000
8	512 GB/s	2 TB	$10,000

4 endpoints is optimal for cost/performance balance.