Per-Head Tracking with GQA
Llama-70B: 64 Query Heads → 8 KV-Heads × 80 Layers = 640 LRU Queues
1. GQA Grouping Defines Tracking Unit
→
KV-Head 0
→
LRU Queue [layer, 0]
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
→
KV-Head 1
→
LRU Queue [layer, 1]
â‹®
Q56
Q57
Q58
Q59
Q60
Q61
Q62
Q63
→
KV-Head 7
→
LRU Queue [layer, 7]
2. 640 Independent LRU Queues
3. What Each Queue Tracks
pos: 127891
access: 847
attn: 0.92
→
pos: 0
access: 512
attn: 0.88
→
pos: 1024
access: 234
attn: 0.45
→
pos: 89012
access: 89
attn: 0.31
→
pos: 45678
access: 12
attn: 0.08
→
EVICT
CANDIDATES
Per-Entry Metadata (8 bytes)
position_id — u32 (4B)
access_count — u16 (2B)
attention_score — fp16 (2B)
640 queues × 131,072 positions × 8 bytes = 640 MB metadata