Appendix D

Rotary Position Embeddings (RoPE)

Rotation mechanics, frequency spectra, and distance-dependent attention decay.

D.1 Core Concept

RoPE encodes position by rotating Q and K vectors in 2D subspaces:

RoPE(x, m) = x ⊗ cos(mθ) + rotate(x) ⊗ sin(mθ)

where m is position and θ_i = 10000^-2i/d for dimension pair i.

The dot product between positions m and n becomes:

q_m · k_n ∝ cos((m-n)θ)

Attention naturally decays with position distance |m-n|.

This creates predictable attention patterns:

We exploit this for RoPE-aware prefetching — tokens near the current decode position are prefetched with higher priority.

Prefetch_priority(p) ∝ 1 / |current_pos - p|

Combined with EMA scores, this achieves 90%+ prefetch hit rates.