© 2025 Subramaniyam (Sam) Pooni
All Rights Reserved
Proprietary & Confidential
Appendix D

Rotary Position Embeddings (RoPE)

Rotation mechanics, frequency spectra, and distance-dependent attention decay.

D.1 Core Concept

RoPE encodes position by rotating Q and K vectors in 2D subspaces:

RoPE(x, m) = x ⊗ cos(mθ) + rotate(x) ⊗ sin(mθ)

where m is position and θi = 10000-2i/d for dimension pair i.

D.2 Position-Dependent Attention

The dot product between positions m and n becomes:

qm · kn ∝ cos((m-n)θ)

Attention naturally decays with position distance |m-n|.

D.3 Locality Exploitation

This creates predictable attention patterns:

We exploit this for RoPE-aware prefetching — tokens near the current decode position are prefetched with higher priority.

D.4 Prefetch Window Strategy

Prefetch_priority(p) ∝ 1 / |current_pos - p|

Combined with EMA scores, this achieves 90%+ prefetch hit rates.