04 Core Technology

Understanding DPUs

Data Processing Units are specialized processors that offload networking, storage, and security tasks from CPUs—enabling wire-speed processing at the infrastructure layer.

What is a Data Processing Unit?

A DPU is a new class of programmable processor that combines three key elements: a high-performance CPU, a high-throughput network interface, and a rich set of flexible accelerators. Think of it as a "smart NIC" on steroids—purpose-built to handle infrastructure tasks that would otherwise burden your main processors.

NVIDIA BlueField-3
Processing
ARM Cores
16× A78 @ 3.0 GHz
🌐
Connectivity
Network Engine
400 Gbps ConnectX-7
📦
Memory
DDR5 + HBM
32GB + 16GB HBM
🔒
Security
Crypto Engine
IPsec/TLS @ 400G
🚀
Acceleration
Custom ASICs
RegEx, Compress, ML
💾
Storage
NVMe/VIRTIO
SNAP + virtio-blk

Data Flow Through a DPU

When a network packet arrives, the DPU intercepts it before it reaches the host CPU. The packet is processed, classified, encrypted/decrypted, and forwarded—all without consuming host CPU cycles.

🌐
Network
400G Ethernet
📥
Ingress
Packet Receive
⚙️
Process
DPU Pipeline
📤
Egress
Packet Send
🖥️
Host CPU
Only App Data

CPU vs DPU Processing

Traditional networking runs on the host CPU, competing with applications for cycles. DPUs offload this work entirely, providing predictable performance and freeing the CPU for revenue-generating workloads.

🔴
CPU Processing
Software-based networking
Latency 50-200 μs
Throughput ~50 Gbps max
CPU Overhead 30-40%
Latency Jitter High variance
Isolation Bypassable
🟢
DPU Processing
Hardware-accelerated
Latency <5 μs
Throughput 400 Gbps
CPU Overhead ~0%
Latency Jitter Sub-μs variance
Isolation Hardware-enforced

The DPU Processing Pipeline

Inside the DPU, packets flow through a series of specialized engines. Each stage is hardware-optimized for specific tasks, enabling wire-speed processing with deterministic latency.

Packet Processing Stages
1
Header Parsing
Extract L2/L3/L4 headers, identify protocol type, prepare metadata
<100 ns
2
Flow Matching
TCAM-based lookup against flow tables, identify tenant and policy
<200 ns
3
Action Execution
Apply transformations: NAT, VXLAN encap/decap, ACL filtering
<300 ns
4
Crypto Processing
IPsec/TLS encryption or decryption at full line rate
<500 ns
5
QoS & Scheduling
Traffic shaping, priority queuing, rate limiting per tenant
<200 ns
6
Forwarding Decision
Determine output port, apply final transformations, transmit
<100 ns

Key Benefits of DPU Architecture

DPUs fundamentally change how infrastructure services are delivered, providing security, performance, and efficiency improvements that weren't possible before.

🎯
Complete Offload
Networking, storage, and security run on the DPU. Host CPU is 100% available for applications.
🛡️
Hardware Isolation
Tenant boundaries enforced in silicon. No software vulnerabilities can bypass controls.
Predictable Performance
Deterministic latency with sub-microsecond jitter. Perfect for latency-sensitive AI workloads.
📈
Linear Scaling
Each DPU handles its own traffic. Adding servers adds proportional network capacity.