09 Hardware Evolution

The DPU Journey

From BlueField-2 to BlueField-4: How each generation improved isolation capabilities and enabled new AI infrastructure possibilities.

๐Ÿ”ท
BlueField-2
2020
200G
Network
๐Ÿ’ 
BlueField-3
2022
400G
Network
โฌก
BlueField-3S
2024
400G
SuperNIC
๐Ÿ”ถ
BlueField-4
2025
800G
Network

Generation by Generation

Each BlueField generation brought significant improvements in processing power, network speed, and isolation capabilities.

2020

BlueField-2

The foundation. First production DPU with integrated ARM cores and hardware isolation capabilities.

8
ARM A72
200G
Network
16GB
DDR4
7nm
Process
2022

BlueField-3

2x network speed, new ARM cores, and DOCA SDK for programmable infrastructure.

16
ARM A78
400G
Network
32GB
DDR5
5nm
Process
2024

BlueField-3 SuperNIC

Optimized for AI workloads. Enhanced RDMA, GPU-direct support, and SHARP in-network computing.

8
ARM A78
400G
Network
SHARP
In-Network
AI
Optimized
2025

BlueField-4

Next generation: 800G networking, enhanced isolation, and advanced AI acceleration.

24
ARM Cores
800G
Network
64GB
HBM
3nm
Process

Detailed Comparison

Side-by-side technical specifications across all BlueField generations.

๐Ÿ“‹ Generation Comparison

Specification BF-2 BF-3 BF-3S BF-4
Network Speed 200 Gbps 400 Gbps 400 Gbps 800 Gbps
CPU Cores 8ร— A72 16ร— A78 8ร— A78 24ร— A78AE
Memory 16GB DDR4 32GB DDR5 16GB DDR5 64GB HBM3
PCIe Gen 4 Gen 5 Gen 5 Gen 6
Flow Entries 500K 1M 1M 4M
Crypto Throughput 100 Gbps 200 Gbps 200 Gbps 400 Gbps
TDP 75W 150W 100W 200W

Performance Scaling

How key metrics have improved across generations.

๐Ÿ“ˆ Performance Over Generations

BF-2 BF-3 BF-3S BF-4
Network Speed
CPU Performance
Flow Capacity

Key Advancements

The most significant improvements introduced with each generation.

๐Ÿ› ๏ธ

DOCA SDK

Unified programming framework for all DPU accelerators and data path functions.

โšก

SHARP

In-network computing for collective operationsโ€”reduces all-reduce latency by 10x.

๐Ÿ’พ

NVMe-oF

Hardware-accelerated storage virtualization for composable infrastructure.

๐Ÿ”ฒ

SR-IOV

Hardware virtualization with 512 virtual functions per port for dense multi-tenancy.

๐Ÿ”

Inline Crypto

Line-rate encryption/decryption with zero CPU overhead.

๐Ÿ“Š

Deep Telemetry

Per-flow counters, latency histograms, and real-time congestion detection.