© 2025 Subramaniyam (Sam) Pooni
All Rights Reserved
Proprietary & Confidential
Appendix H

Memory Hierarchy and Caching Theory

Three-tier design, effective latency formulas, and hit rate analysis.

H.1 Three-Tier Hierarchy

TierMediaCapacityLatency
Tier 0: HBM PinnedGPU HBM~5 GB100 ns
Tier 1: HBM EvictableGPU HBM~37 GB100 ns
Tier 2: CXL DRAMEndpoint DDR51 TB250 ns
Tier 3: FlashNVMe SSD16 TB25 μs

H.2 Effective Latency

Leff = Σ (hit_ratei × latencyi)

With 95% HBM hit rate, 4.5% CXL hit, 0.5% flash:

Leff = 0.95×100 + 0.045×250 + 0.005×25000 = 95 + 11.25 + 125 = 231 ns

H.3 Hit Rate Breakdown

TechniqueContributionCumulative
LRU baseline70%
+ Anchor pinning+8%78%
+ EMA scoring+7%85%
+ Per-head tracking+6%91%
+ RoPE prefetch+4%95%