Balancing KV-cache memory efficiency with model quality
2 × n_layers × n_kv_heads × seq_len × head_dim × dtype_size