Summary of contributions and key takeaways.
CXL 3.0 computational storage endpoints with controller-resident intelligence for autonomous cache management.
Per-head importance tracking, EMA-based scoring, and RoPE-aware prefetching achieving 95% HBM hit rate.
CXL.mem load/store semantics eliminate CPU interrupt handling for 250 ns vs 16 μs access.
36% infrastructure cost reduction, 8× more concurrent users per GPU, enabling profitable long-context serving.