Chapter 8: Performance Analysis | Wire-Speed Tenant Isolation

Throughput

Bandwidth Performance

DPU throughput varies by packet size and processing complexity. Small packets stress the packet rate, while large packets stress bandwidth.

📊 Throughput by Packet Size

64 bytes

100 Gbps

25%

512 bytes

240 Gbps

60%

1500 bytes

340 Gbps

85%

9000 bytes (Jumbo)

400 Gbps

100%

Full 400 Gbps line rate achieved with jumbo frames (MTU 9000)

Latency

End-to-End Latency Breakdown

Every microsecond counts in AI workloads. Understanding where latency comes from helps optimize the entire path.

⏱️ Latency Components (Total: 8.5µs)

Wire

Parse

Policy

QoS

TX

Wire: 1µs

Parse: 1.5µs

Policy: 3µs

QoS: 2µs

Transmit: 1µs

Benchmarks

Real-World Performance

Lab benchmarks vs production reality. The numbers that matter for actual AI training workloads.

Metric	Spec	Measured	Status
Max Throughput	400 Gbps	398 Gbps	✓ PASS
Packet Rate	595 Mpps	520 Mpps	⚠ 87%
P99 Latency	<10µs	8.5µs	✓ PASS
Jitter	<1µs	0.8µs	✓ PASS
Packet Loss	0%	0.001%	⚠ Under Load
Isolation Breach	0	0	✓ SECURE

Live Metrics

System Utilization

Real-time view of DPU resource utilization during typical AI training workload.

📈 Resource Utilization

45%

ARM CPU

8x A78 Cores

78%

Accelerators

Crypto + RegEx

62%

Memory

16GB DDR5

Traffic Analysis

Latency Heatmap

24-hour view of latency distribution across all tenant traffic. Each cell represents 2 hours × one latency bucket.

🗓️ Latency Distribution Over Time

00:00 04:00 08:00 12:00 16:00 20:00 24:00

<5µs

>15µs

Comparison

DPU vs Traditional

Side-by-side comparison of DPU-based isolation versus traditional software approaches.

🔷 BlueField-3 DPU

Throughput 400 Gbps

Latency (P99) 8.5µs

CPU Overhead 0%

Isolation Hardware

💻 OVS + DPDK

Throughput 100 Gbps

Latency (P99) 150µs

CPU Overhead 8 cores

Isolation Software