G.1 EMA Update Rule
scoret(p) = α × attentiont(p) + (1 - α) × scoret-1(p)
This creates a "memory" of attention importance that decays gradually.
G.2 Properties
- Half-life: t1/2 = -ln(2) / ln(1-α)
- Steady state: For constant attention a, score → a
- Decay: Without attention, score → 0 exponentially
G.3 α Selection
| α | Half-Life (steps) | Use Case |
| 0.05 | 13.5 | Very stable, long memory |
| 0.1 | 6.6 | Recommended default |
| 0.2 | 3.1 | Faster adaptation |
G.4 EMA vs LRU
- LRU problem: System prompt at position 5 evicted after 100 steps despite receiving 4% attention every step
- EMA solution: Stable 0.04 score keeps it cached
Result: +15% hit rate improvement over LRU