08 Performance Analysis

Measuring Success

Benchmarks, metrics, and real-world performance data that prove (or disprove) wire-speed tenant isolation claims.

400G
Line Rate
<10µs
P99 Latency
0%
Packet Loss
100%
Isolation

Bandwidth Performance

DPU throughput varies by packet size and processing complexity. Small packets stress the packet rate, while large packets stress bandwidth.

📊 Throughput by Packet Size

64 bytes
100 Gbps
25%
512 bytes
240 Gbps
60%
1500 bytes
340 Gbps
85%
9000 bytes (Jumbo)
400 Gbps
100%

Full 400 Gbps line rate achieved with jumbo frames (MTU 9000)

End-to-End Latency Breakdown

Every microsecond counts in AI workloads. Understanding where latency comes from helps optimize the entire path.

⏱️ Latency Components (Total: 8.5µs)

Wire
Parse
Policy
QoS
TX
Wire: 1µs
Parse: 1.5µs
Policy: 3µs
QoS: 2µs
Transmit: 1µs

Real-World Performance

Lab benchmarks vs production reality. The numbers that matter for actual AI training workloads.

Metric Spec Measured Status
Max Throughput 400 Gbps 398 Gbps ✓ PASS
Packet Rate 595 Mpps 520 Mpps ⚠ 87%
P99 Latency <10µs 8.5µs ✓ PASS
Jitter <1µs 0.8µs ✓ PASS
Packet Loss 0% 0.001% ⚠ Under Load
Isolation Breach 0 0 ✓ SECURE

System Utilization

Real-time view of DPU resource utilization during typical AI training workload.

📈 Resource Utilization

45%
ARM CPU
8x A78 Cores
78%
Accelerators
Crypto + RegEx
62%
Memory
16GB DDR5

Latency Heatmap

24-hour view of latency distribution across all tenant traffic. Each cell represents 2 hours × one latency bucket.

🗓️ Latency Distribution Over Time

00:00 04:00 08:00 12:00 16:00 20:00 24:00
<5µs
>15µs

DPU vs Traditional

Side-by-side comparison of DPU-based isolation versus traditional software approaches.

🔷 BlueField-3 DPU

Throughput 400 Gbps
Latency (P99) 8.5µs
CPU Overhead 0%
Isolation Hardware

💻 OVS + DPDK

Throughput 100 Gbps
Latency (P99) 150µs
CPU Overhead 8 cores
Isolation Software