EMA Attention Scoring

The EMA Update Rule

score_new = 0.2 Ã— attention + 0.8 Ã— score_old

Each decode step, we update the score using exponential moving average.
Tokens that consistently receive attention accumulate high scores.
Tokens that are ignored see their scores decay toward zero.

Token: "helpful" (instruction)

Position 45

Step 1

0.2 Ã— 0.04 + 0.8 Ã— 0

0.008

Step 2

0.2 Ã— 0.05 + 0.8 Ã— 0.008

0.016

Step 10

0.2 Ã— 0.03 + 0.8 Ã— 0.042

0.040

Step 50

0.2 Ã— 0.04 + 0.8 Ã— 0.11

0.096

Step 100

0.2 Ã— 0.05 + 0.8 Ã— 0.14

0.122

Final Score after 100 steps

0.122 â†’ KEEP

Token: "the" (filler word)

Position 5,432

Step 1

0.2 Ã— 0.002 + 0.8 Ã— 0

0.0004

Step 2

0.2 Ã— 0.001 + 0.8 Ã— 0.0004

0.0005

Step 10

0.2 Ã— 0.001 + 0.8 Ã— 0.0008

0.0008

Step 50

0.2 Ã— 0.000 + 0.8 Ã— 0.0006

0.0005

Step 100

0.2 Ã— 0.001 + 0.8 Ã— 0.0004

0.0005

Final Score after 100 steps

0.0005 â†’ EVICT

Why Î± = 0.2? The Half-Life Calculation

(1 - Î±)^n = 0.5
0.8^n = 0.5
n = log(0.5) / log(0.8)
n = 3.1 steps

At 20 tokens/second, the half-life is 155 milliseconds.

This means: if a token doesn't receive attention for 155ms, its score drops by 50%. Tokens that are consistently important maintain high scores; tokens that were briefly accessed but then ignored see their scores decay quickly.

EMA-Based Attention Scoring