Using EMA to transform noisy attention into stable eviction priorities
Each decode step, every cached token receives an attention weight from the current query. These scores are noisy—a token might spike one step and drop the next.
| Step | new_score | Calculation | score_ema |
|---|---|---|---|
| tâ‚ | 0.45 | 0.2 × 0.45 + 0.8 × 0.00 | 0.090 |
| tâ‚‚ | 0.12 | 0.2 × 0.12 + 0.8 × 0.135 | 0.096 |
| t₃ | 0.30 | 0.2 × 0.30 + 0.8 × 0.131 | 0.137 |
| tâ‚„ | 0.08 | 0.2 × 0.08 + 0.8 × 0.182 | 0.126 |
| tâ‚… | 0.35 | 0.2 × 0.35 + 0.8 × 0.151 | 0.171 |