Chapter 6: Microbursts & AI Traffic | Wire-Speed Tenant Isolation

The Phenomenon

What is a Microburst?

A microburst is a sudden, massive spike in network traffic lasting microseconds to milliseconds. While average bandwidth might look acceptable, these instantaneous peaks can be 10-100x higher—overwhelming buffers and causing packet loss.

⚡
Microburst Traffic Pattern

400 Gbps PEAK

t = 0 500µs 1ms 1.5ms 2ms

Green dashed line = Average bandwidth (40 Gbps) • Orange spike = Instantaneous burst (400 Gbps)

The Contrast

Traditional vs AI Traffic Patterns

Traditional web and enterprise traffic spreads smoothly across time. AI training traffic is fundamentally different—thousands of GPUs synchronize simultaneously, creating explosive bursts that shatter network assumptions.

🌐 Traditional Traffic

Peak/Avg Ratio

~2x

Burst Duration

seconds

🤖 AI Training Traffic

Peak/Avg Ratio

100x

Burst Duration

<1ms

The Root Cause

GPU Gradient Synchronization

During AI training, GPUs compute independently, then must share results simultaneously. When 10,000+ GPUs all transmit at once, network fabric sees an instant traffic explosion.

🔄 Distributed Training: All-Reduce Synchronization

NODE A • 8x H100

GPU 0

GPU 1

GPU 2

GPU 3

GPU 4

GPU 5

GPU 6

GPU 7

400G
InfiniBand

NODE B • 8x H100

GPU 0

GPU 1

GPU 2

GPU 3

GPU 4

GPU 5

GPU 6

GPU 7

Each GPU computes gradients → All GPUs transmit simultaneously → Network sees 16× parallel 400G flows

⏱️ Training Step Timeline

Forward/Backward Pass (Compute)

All-Reduce Sync (BURST)

Weight Update (Low Traffic)

The Consequences

Why Microbursts Matter

When bursts overwhelm network buffers, packets are dropped. In AI training, dropped packets trigger retransmissions that delay ALL GPUs, creating a cascade of wasted compute cycles.

📦

Buffer Overflow

Switch buffers fill in microseconds during bursts

5µs

📉

Packet Loss

Even 0.1% loss degrades training by hours

0.1%

💸

GPU Idle Time

$10,000+ GPUs waiting for network

30%

The Chain Reaction

How One Burst Breaks Everything

A single microburst can trigger a cascade that stalls thousands of GPUs across multiple tenants—demonstrating why isolation at wire speed is non-negotiable.

🔗 The Cascade Effect

⚡

Microburst

→

📦

Buffer Full

→

❌

Packet Drop

→

🔄

Retransmit

→

⏸️

GPU Stall