© 2025 Subramaniyam (Sam) Pooni
All Rights Reserved
Proprietary & Confidential
Chapter 4

Effective Latency Analysis

Two-tier cache model, CXL vs PCIe comparison, and the 65× latency improvement.

9
Figures in Chapter
250 ns
CXL.mem
16 μs
PCIe DMA
65×
Improvement

4.1 Two-Tier Cache Model

TierMediaLatencyCapacity
Tier 1Endpoint DDR5250 ns1 TB
Tier 2Endpoint NVMe25 μs16 TB
FallbackRecompute50 ms

4.2 Effective Latency Formula

Leff = α × Ldram + β × Lflash + γ × Lrecompute Weighted average across cache tiers

With 85% DRAM hit rate, 14% flash hit, 1% miss:

Calculate:
Leff = 0.85 × 250 ns + 0.14 × 25 μs + 0.01 × 50 ms
= 212.5 ns + 3.5 μs + 500 μs = 504 μs ≈ 0.5 ms
Figure 4.1 — Effective Latency Analysis Open Full Screen ↗

4.3 CXL.mem vs PCIe DMA Path Comparison

ComponentCXL.memPCIe DMA
CPU involvementNoneRequired (interrupt)
Protocol overhead100 ns2-5 μs
Memory access100 ns100 ns
TLB managementHardwareSoftware (1-2 μs)
Total~250 ns5-16 μs
Figure 4.2 — Latency Path Comparison Open Full Screen ↗

4.4 Access Timeline (Waterfall)

Figure 4.3 — Latency Waterfall Diagram View TSX Source ↗
🚀 65× Latency Improvement

CXL.mem eliminates CPU interrupt handling, explicit DMA setup, and software TLB management. Result: 250 ns vs 16+ μs = 65× faster