© 2025 Subramaniyam (Sam) Pooni
All Rights Reserved
Proprietary & Confidential
Appendix E

Attention Head Specialization

Recency heads, anchor heads, retrieval heads, and syntactic heads.

E.1 Head Types Overview

Research reveals that attention heads specialize into distinct functional roles during training:

Type% of HeadsAttention PatternCache Implication
Recency~40%Last 50-200 tokensKeep recent context hot
Anchor~15%Positions 0-100 (system prompt)Pin anchor zone permanently
Retrieval~25%Content-based lookupUse EMA scoring
Syntactic~20%Grammar patternsSparse, pattern-based

E.2 Why Per-Head Tracking Matters

A token might be:

Token-level eviction would incorrectly evict this token. Per-head tracking preserves it.

E.3 Aggregation Rule

Keep(position) = maxh(importanceh(position)) > threshold

Position survives if any head needs it.