Step-by-step walkthrough of locality-aware KV-cache prefetching
RoPE causes attention to concentrate around the query position. Positions 5–11 (within ±3 of query P=8) capture ~72% of total attention mass.
Because rotary position encoding causes attention to concentrate near the query position, we can predict which KV pairs will be needed and fetch them before the GPU stalls. This transforms random storage access into sequential prefetch streams.