// Chief Architect, Agentic Intelligence

SUBRAMANIYAM VENKATA POONI

Building Agentic LLM Systems with LangChain, Multi-Agent RAG & Agent-Based Reasoning

AP2 | SLIM | X-A2A | LLMOps | Scalable AI Inference | Compilers: MLIR, LLVM, WASM

31+
Years
6
Patents
2
Exits
$50M+
Deals
12+
Trade Secrets
4
Awards

CSSQUAREDB Technologies

Enterprise AI Agent Development & LLMOps

CSยฒB Technologies

CS2B

Cognitive Systems ร— Scalable Behaviors

CSยฒB Technologies bridges the gap between AI research and production systems. We design, build, and operate multi-agent AI systems โ€” from RAG pipelines to protocol integrations โ€” with the observability and guardrails required for real-world enterprise deployment.

Our Mission

"We help enterprises deploy and operate AI agent systems reliably at scale."

50+
Agent Systems
99.5%
Uptime SLA
40%
Cost Reduction
3.5ร—
Faster Delivery
Core Services
๐Ÿง  Multi-Agent RAG ๐Ÿ”— Protocol Integration ๐Ÿ“Š LLMOps โšก Inference Optimization ๐Ÿ›ก๏ธ Security & Compliance ๐Ÿš€ Managed Operations
Protocol Expertise
MCP A2A X-A2A AP2 SLIM ACP ANP AGORA
Contact CSยฒB Visit Company Website โ†’

Work Experience

Three decades of innovation across enterprise, startups, and research

CSSQUAREDB Technologies Inc.

Chief Architect & Founding Engineer, Agentic Intelligence

Sep 2025 โ€“ Present ยท San Jose, CA

  • Multi-Stack GenAI on DGX/HGX B200, MI-350x, Nemotron, DeepSeek
  • Agent Systems: LangChain, LangGraph, ReAct, Google ADK, Anthropic MCP
  • Agentic RAG with A2A, AP2, SLIM protocols
  • Enterprise LLM Hosting, Inference Engineering, Cost Optimization
Multi-Stack GenAI Agentic RAG MCP Protocols

Broadcom (VMware)

Principal Architect โ€“ GenAI | HPC | SDDC

Dec 2021 โ€“ Aug 2025 ยท 3 yrs 9 mos ยท Palo Alto, CA

  • PAIF-N AIaaS for CSPs: GPU-as-a-Service, AI PaaS, Model-as-a-Service
  • RAG Stack CI/CD, LLM Agent Programming, VMware Aria Automation 8.x
  • 3ร— throughput MLPerf, 40% latency reduction, 2ร— faster LoRA fine-tuning
<2 min deploy ~40% margins 95%+ success $50M+ SOWs

CSSQUAREDB Technologies

Founder CTO โ€“ Telco Cloud / 5G Open RAN

Dec 2020 โ€“ Dec 2021 ยท 1 yr 1 mo ยท San Jose, CA

  • DISH 5G Open RAN: 7.8K+ sites, Zero Touch Provisioning
  • GitLab-Airflow Pipeline, 135+ TKG clusters, 1350 CNF in <6 hrs
70% US coverage 7.8K+ sites 128 parallel upgrades

Personal Goal Pursuit

Career Break ยท Key Projects & Innovations

Sep 2019 โ€“ Dec 2020 ยท 1 yr 4 mos ยท San Jose

  • ๐Ÿš€ Distributed ML Parameter Server (Python, Raft, Asyncio) โ€“ 40%+ throughput
  • ๐Ÿš€ MLIR-Based Compiler for WASM Edge Inference
  • Founded CSSQUAREDB TECHNOLOGIES INC. (Dec 2020)
40%+ throughput MLIR/LLVM compiler

Futurewei Technologies

Principal Lead ยท Software Competence Centre for Wireless, CTO Office

Feb 2016 โ€“ Sep 2019 ยท 3 yrs 8 mos ยท Santa Clara

  • SONiCS 1.0โ†’4.0: Microservices, FaaS, MLaaS, Federated Learning
  • Collaborated with Berkeley AI Research (BAIR) & Georgia Tech
  • ๐Ÿ† Top Innovation Award ยท ๐ŸŽ–๏ธ Future Star Medal ยท โญ WOW Team Award
1M+ FaaS/day 12+ trade secrets 4 awards

A10 Networks, Inc.

Director of Research & Development, CTO Office

Dec 2011 โ€“ Jan 2016 ยท 4 yrs 2 mos ยท San Jose

  • 500+ engineers โ†’ DevOps model with CI/CD & IaC
  • Cisco ACI, VMware NSX, OpenStack integration
3ร— faster releases $10M+ savings

Virtustream โ†’ Dell Acquisition

Principal SDN Architect

Aug 2010 โ€“ Dec 2011 ยท 1 yr 5 mos ยท SF Bay Area

  • xStream Platform: Storage/Networking for Data Centers
  • SDN/SDS orchestration lead
SDN/SDS lead Dell exit

Dorado Software

Network & Storage Architect

Nov 2007 โ€“ Aug 2010 ยท 2 yrs 10 mos ยท Folsom

  • Redcell VRM, Campus Manager, Storage Commander platforms

Hewlett Packard

Software Architect, Networking & Storage

Jul 1997 โ€“ Dec 2007 ยท 10 yrs 6 mos ยท Roseville

  • D2D Backup, OpenView Storage Area Manager (OVSAM)
  • SNIA T10/T11 standards ยท 6 patents
6 patents 60% faster backups

Starcom Technology Inc

Systems Software Engineer

Jul 1997 โ€“ Dec 1999 ยท 2 yrs 6 mos ยท Pleasanton

  • Fibre Channel drivers, QLogic HBA, HP-UX, VxWorks & Nucleus RTOS

IBM (Sequent Computers)

Software Engineer, Base Operating Systems

Mar 1994 โ€“ Jul 1997 ยท 3 yrs 5 mos ยท Beaverton, OR

  • DYNIX/PTX OS kernel: thread scheduling, memory management, I/O stack
  • SMP & NUMA systems, device drivers

Key Projects

11 high-impact projects across AI, HPC, and systems engineering

๐Ÿ”ท

VMware Aria Automation | Multi-Cloud IaC & CI/CD

Broadcom ยท Dec 2020 - Aug 2025

Advanced automation with Aria 8.x, HA clustering, SAML/LDAP, RBAC. Self-service hybrid cloud across VCF, AWS, Azure, GCP.

$50M+ SOWs 100% 5G success 50+ CI/CD flows 90%+ SLA
๐Ÿ”ฌ

LLMOps Frameworks | Prompt Engineering | RAG

Broadcom ยท Jan 2025 - Aug 2025

Prompt-based LLM finetuning for Mistral-7B, Phi-2, Gemma. RLHF evaluations, GPU-aware observability, model versioning.

30% consistency 4ร— CI/CD <10s recovery
๐Ÿค–

LLM Agent Programming

Broadcom ยท LangChain, Triton, FastAPI ยท Jun 2024 - Aug 2025

Modular agents for RAG-backed assistants and structured extractors with full observability.

95%+ success <2s latency 10K+ calls/week 50% fewer hallucinations
โ˜๏ธ

AIaaS for CSPs Expertise

Broadcom ยท VMware PAIF-N ยท Jun 2024 - Aug 2025

VMware Private AI Foundation with NVIDIA for CSPs. Monetization: GaaS, AI PaaS, Model-as-a-Service, AI Applications.

<2min deploy ~40% margins 3ร— TTR
๐Ÿš€

AI Performance Engineering | HPC | LLM Inference

Broadcom ยท TensorRT, vLLM, Triton ยท Jun 2024 - Aug 2025

Training/inference optimization for Llama 3.1, Mixtral, BERT on NVIDIA (B200, GH200, H100) & AMD (MI350X).

3ร— MLPerf 40% latency โ†“ 2ร— LoRA speed 95% success
โšก

Scalable Distributed ML Parameter Server

CSSQUAREDB ยท Python, Asyncio, Raft ยท Sep 2019 - Dec 2020

High-performance distributed training across multi-node clusters. Fault-tolerant key-value store with Raft consensus.

40%+ throughput Raft consensus PyTorch/TF
github.com/SubramaniyamPooni/pyraft
โš™๏ธ

Compiler for WebAssembly AI Edge Inference

CSSQUAREDB ยท Python, LLVM, MLIR ยท Nov 2019 - Dec 2020

Custom MLIR-based compiler targeting WebAssembly. C-like DSL with LLVM IR and WASM backends.

MLIR compiler C++ performance IoT/Edge
github.com/SubramaniyamPooni/compilers
๐Ÿฆ€

Crusty Lox Interpreter in Rust

Personal ยท Crafting Interpreters ยท Dec 2024 - Present

Tree-walking interpreter with idiomatic Rust. Scanner, recursive descent parser, tree-walk evaluator.

Rust idioms FP patterns TDD
github.com/SubramaniyamPooni/crusty_interpreter
๐Ÿ

Applied R&D in Software Design Patterns

CSSQUAREDB ยท Python ยท Apr 2020 - Jun 2020

Modular software engineering framework. Functional, Reactive, Actor Model paradigms with test harness.

FP/OOP Reactive TDD harness
github.com/SubramaniyamPooni/advanced_python_programming
โ˜•

Java Language New Features Experimentation

Personal ยท May 2020 - Jul 2020

JDK 9-14 features, Scala essentials, Functional Programming, Java Modules, Design Patterns exploration.

JDK 9-14 Scala FP in Java
๐ŸŽจ

Ray Tracer in Rust

Personal ยท Rust, Graphics, Linear Algebra

3D rendering engine with ray casting, materials, lighting, and reflections.

Rust Graphics Linear Algebra

Skills & Expertise

300+ technologies across the full stack

๐Ÿ† Top Endorsed Skills
Cloud Computing20
Device Drivers16
Linux14
Data Center10
Distributed Systems10
Virtualization8
Storage8
Operating Systems7
Software Development6
๐Ÿค– AI/ML & Agentic Systems
๐Ÿ”— Agent Frameworks
LangChain LangGraph AutoGPT CrewAI Semantic Kernel Haystack LlamaIndex OpenAI Assistants Google ADK Anthropic MCP
๐Ÿง  LLM Integration
OpenAI API Claude API Gemini API Cohere HuggingFace Ollama LM Studio Nemotron DeepSeek
๐Ÿ—„๏ธ Vector DBs & Memory
Pinecone Weaviate Chroma Qdrant Milvus FAISS Redis Neo4j MemGPT
๐Ÿ“Š Testing & Observability
LangSmith W&B MLflow AgentOps Phoenix Arize DeepEval RAGAS Langfuse
๐ŸŽฎ NVIDIA / HPC Stack
๐Ÿ–ฅ๏ธ Hardware & Platforms
H100 A100 Grace Hopper DGX H100 DGX B200 HGX B200 SuperPOD Jetson Orin
โšก AI/ML Software Stack
CUDA TensorRT Triton Server NeMo TAO Toolkit RAPIDS cuDF cuML DeepStream
๐ŸŒ Networking & Storage
InfiniBand HDR/NDR RoCEv2 ConnectX BlueField DPU GPUDirect RDMA NVMe-oF
๐Ÿš€ DevOps & Cloud Infrastructure
๐Ÿ”„ CI/CD & IaC
Jenkins GitLab CI GitHub Actions Terraform Ansible ArgoCD FluxCD
๐Ÿณ Containers & Cloud
Docker Kubernetes OpenShift Helm AWS Azure GCP VMware VCF
๐Ÿ“Š Monitoring & Security
Prometheus Grafana ELK Stack OpenTelemetry Vault Snyk Trivy
โš™๏ธ Languages, Compilers & Architecture
๐Ÿ’ป Polyglot
Python Rust Go C C++ Java Scala
๐Ÿ”ง Compiler Internals
LLVM MLIR TVM XLA WASM TensorRT ONNX
๐Ÿ—๏ธ Architecture
Microservices Event-driven SOA TOGAF Zachman SDDC
๐Ÿ”ท VMware / Broadcom Stack
VCF 5.2 Aria Automation 8.17 Aria Operations Cloud Director 10.6 NSX 4.x vSAN 8 vSphere 8 TKG vRO ABX

US Patents

6 patents in storage and networking technologies

US 8,375,396 Backup Procedure with Transparent Load Balancing 2013
US 7,610,295 Generating Persistent Path Identifiers 2009
US 7,181,553 Identifying Multiple Paths to SCSI Device 2007
US 7,069,351 Method for Identifying SCSI Logical Units 2006
US 6,934,710 Managing Fabric Device Access 2005
US 10/260,419 Communicating with SCSI Devices on Linux 2002

Honors & Awards

๐Ÿ†

Top Innovation Award

GANs for Wireless

Futurewei Technologies ยท 2019

๐Ÿ…

Outstanding Contributions

AI in Wireless Space

Huawei NJ Research ยท Dec 2018

โญ

WOW Team Award

GAN SW Receiver

Huawei NJ Research ยท Dec 2018

๐ŸŽ–๏ธ

Future Star Medal

Outstanding AI Contributions

Huawei ยท 2017

Certifications & Courses

Continuous learning in cutting-edge technologies ยท 20+ certifications

VMware Livefire Certifications

Advanced hands-on technical training programs

VMware NSX Training

Software-defined networking and security

VMware Aria Automation Training

Infrastructure as code and cloud management

VMware vSphere & Cloud Director

Core virtualization and cloud service provider platforms

Team Awards

Recognition for outstanding contributions and teamwork

Official Course Documentation

Complete course materials from VMware training programs

Professional Courses

Specialized training in programming and systems design

ฮป

Functional Programming in Scala

John De Goes

๐ŸŽจ

The Art of Functional Design

John De Goes

๐Ÿ”ง

Write a Compiler (Python)

David Beazley

๐Ÿ”—

Implementing Raft Consensus

David Beazley ยท View Docs โ†’

๐Ÿ

Advanced Python

David Beazley

Education & Research

World-class foundation

๐ŸŽ“

IIT Madras

Research Assistant

Computer Speech Lab ยท Indian Institute of Technology

๐ŸŽ“

Mangalore University

Master's Degree

Management Information Systems ยท First Class with Distinction

๐ŸŽ“

Sri Venkateswara College of Engineering

Bachelor's Degree

Computer Science & Engineering ยท 1987-1991

Research & Publications

Ongoing research contributions in AI, distributed systems, and compiler technologies

๐Ÿ”ฌ ACTIVE RESEARCH AREAS

๐Ÿค–

Agentic AI Systems

Multi-agent orchestration, ReAct patterns, tool-augmented reasoning, and autonomous AI workflows using LangChain, LangGraph, and custom agent frameworks.

Multi-Agent RAG AP2 Protocol X-A2A
โšก

LLM Inference Optimization

High-performance inference pipelines, quantization strategies, KV-cache optimization, and hardware-aware compilation for NVIDIA and AMD accelerators.

TensorRT vLLM Triton
๐Ÿ”ง

Compiler Technologies

MLIR-based compiler design, LLVM optimization passes, WebAssembly targets for edge AI, and domain-specific language development.

MLIR LLVM WASM

๐Ÿ“š PUBLICATIONS & PAPERS

๐Ÿ–ฅ๏ธ AI ACCELERATOR PLATFORMS

AI Accelerator Market Report 2026: The Platform Race

Subramaniyam V. Pooni

Industry Report ยท January 2026

NVIDIA AMD Intel HPC

Comprehensive analysis of the AI accelerator landscape in 2026, covering GPU, TPU, and custom ASIC architectures from NVIDIA, AMD, Intel, Google, and emerging players.

Read Full Report โ†’

NVIDIA CUDA Platform: Deep Dive Documentation Series

Subramaniyam V. Pooni

Technical Documentation ยท 2026 ยท 6 Chapters ยท CUDA 14.x | Rubin Architecture

CUDA PTX Hopper Blackwell

Comprehensive technical exploration of GPU computing architecture from PTX binaries to kernel execution. Covers CUDA compilation pipeline, GPU architecture evolution (Pascalโ†’Voltaโ†’Ampereโ†’Hopperโ†’Blackwellโ†’Rubin).

Explore CUDA Deep Dive โ†’

AMD ROCm Platform: Deep Dive Documentation Series

Subramaniyam V. Pooni

Technical Documentation ยท 2026 ยท 7 Chapters ยท ROCm 6.x | MI300X/MI350X

ROCm HIP CDNA MI300

Comprehensive technical exploration of AMD's ROCm platform and CDNA architecture. Covers HIP programming, AMDGPU compiler, Instinct accelerators (MI250โ†’MI300โ†’MI350), and datacenter GPU deployments.

Explore ROCm Deep Dive โ†’

Google TPU & XLA Platform: Deep Dive Documentation Series

Subramaniyam V. Pooni

Technical Documentation ยท 2026 ยท 3 Chapters ยท TPU v7 Ironwood | JAX/XLA Stack

TPU XLA JAX HLO

Comprehensive technical exploration of Google's Tensor Processing Units, XLA compiler infrastructure, and JAX programming framework. Covers TPU architecture evolution (v1โ†’v4โ†’v5eโ†’v5pโ†’Trilliumโ†’Ironwood).

Explore TPU Deep Dive โ†’

AWS Trainium & Neuron Platform: Deep Dive Documentation Series

Subramaniyam V. Pooni

Technical Documentation ยท 2026 ยท 7 Chapters ยท Trainium3 | Neuron SDK | NKI

Trainium Neuron SDK NKI NeuronCore

Comprehensive technical exploration of Amazon's Trainium AI accelerators, Neuron SDK compiler infrastructure, and NeuronCore architecture. Covers Trainium evolution (1โ†’2โ†’3), NKI kernel programming.

Explore Trainium Deep Dive โ†’

๐ŸŒ HIGH-PERFORMANCE NETWORKING & INTERCONNECTS

Ultra Ethernet vs RDMA + NVMe-oF Integration

Subramaniyam V. Pooni

Technical Reference ยท December 2025 ยท 8 Sections ยท UEC Spec 1.0 | NVMe-oF 1.1

UEC RDMA NVMe-oF RoCE v2

Comprehensive analysis of high-performance networking for AI/HPC and storage. Covers RDMA fundamentals, UEC architecture, memory operations, flow control, NVMe over Fabrics, AI collectives.

1M+ UEC Endpoints <2ฮผs NVMe-oF Latency 800G Link Speed
Read Full Reference โ†’

CXL + UEC Integration: Bridging Internal Memory Fabric to External Network

Subramaniyam V. Pooni

LinkedIn Article ยท 2025 ยท Industry Analysis

CXL 3.0 UEC Memory Fabric

Analysis of how CXL and UEC technologies can be integrated to bridge internal memory fabric with external network fabric. Explores cache-coherent interconnects and memory pooling.

Read on LinkedIn โ†’

๐Ÿ’พ STORAGE & MEMORY SYSTEMS

Storage Implications of New Generation AI Applications

Subramaniyam V. Pooni

LinkedIn Article ยท 2025 ยท AI Infrastructure

AI Storage LLM Inference Data Pipeline

Analysis of storage architecture requirements for next-generation AI workloads. Explores checkpoint/restart patterns, model weight distribution, and evolving storage hierarchy.

Read on LinkedIn โ†’

Storage is the Bottleneck: A GPU-NVMe Technical Deep Dive

Subramaniyam V. Pooni

Technical Documentation ยท 2025 ยท 4 Chapters, 26 Sections, 1MB+

NVMe GPUDirect CUDA HPC

Publication-quality technical documentation on GPU-storage integration challenges. Covers NVMe queue architecture, doorbell serialization, GPUDirect Storage, CXL memory semantics.

Read Full Documentation โ†’

Distributed Endpoint Architecture for KV-Cache Offloading in LLM Inference

Subramaniyam V. Pooni

Technical Reference v3.0 ยท December 2025 ยท 13 Chapters, 10 Appendices

KV-Cache CXL 3.0 LLM Inference

Memory-efficient LLM serving using CXL-based intelligent memory endpoints. Per-head tracking, EMA-based attention scoring, RoPE-aware prefetch, hardware-accelerated cache management.

6ร— Memory Expansion 16ร— User Capacity 97% HBM Hit Rate 36% Cost Reduction
Read Full Reference โ†’

๐Ÿง  AI/ML ARCHITECTURE & SYSTEMS

Graph Neural Networks & Large Language Models: A Visual Guide

Subramaniyam V. Pooni

LinkedIn Article ยท 2025 ยท AI/ML Architecture

GNN LLM Transformers

Visual guide exploring the intersection of Graph Neural Networks and Large Language Models. Covers message passing, attention mechanisms, graph transformers, knowledge graph integration.

Read on LinkedIn โ†’

Scalable Multi-Agent RAG Architectures for Enterprise LLM Deployments

Subramaniyam V. Pooni

Working Paper ยท 2025

Agentic AI RAG LLMOps

Proposes a scalable architecture for deploying multi-agent RAG systems in enterprise environments, addressing challenges in agent coordination, memory management, and inference optimization.

Agent Communication Protocols & Context Engineering

Subramaniyam V. Pooni

Technical Documentation ยท CSยฒB Technologies ยท 2025 ยท 16 Chapters

MCP A2A Context Engineering WSCI

Comprehensive research document exploring MCP, A2A, and emerging standards for multi-agent communication. Deep dives into context engineering frameworks including the WSCI methodology. Features 6 protocols, 4 frameworks, and 50+ code examples.

View Agent Protocols Research โ†’

๐Ÿ”ง CPU MICROARCHITECTURE

Bridge Checkpoint Unit (BCU) Microarchitecture

Subramaniyam V. Pooni

LinkedIn Article ยท 2025 ยท CPU Microarchitecture

Microarchitecture Checkpoint OoO Execution

Deep dive into Bridge Checkpoint Unit (BCU) microarchitecture for modern out-of-order processors. Explores checkpoint/rollback mechanisms, speculative execution recovery, register renaming.

Read on LinkedIn โ†’

โš™๏ธ COMPILERS & DISTRIBUTED SYSTEMS

MLIR-Based Compiler Design for Edge AI Inference on WebAssembly

Subramaniyam V. Pooni

Technical Report ยท CSSQUAREDB Technologies ยท 2020

MLIR WASM Edge AI

Presents a novel MLIR-based compiler toolchain targeting WebAssembly for deploying lightweight AI models on edge devices with near-native performance.

Distributed Parameter Server with Raft Consensus for Federated Learning

Subramaniyam V. Pooni

Technical Report ยท CSSQUAREDB Technologies ยท 2020

Distributed ML Raft Federated

Introduces a fault-tolerant parameter server architecture using Raft consensus for distributed machine learning, achieving 40%+ throughput improvement in federated learning scenarios.

Multi-Target Compiler Infrastructure: Deep Dive Research

Subramaniyam V. Pooni

Technical Documentation ยท CSยฒB Technologies ยท 2025 ยท 9 Phases

LLVM WebAssembly LALR Type Systems

Comprehensive multi-target compiler infrastructure covering lexical analysis, LALR parsing, type checking, stack-based IR design, WebAssembly binary encoding, LLVM code generation, and Python bytecode emission.

View Compiler Deep Dive โ†’

PyRaft: Distributed Consensus Implementation & Documentation

Subramaniyam V. Pooni

Technical Documentation ยท CSยฒB Technologies ยท 2025 ยท Full Implementation

Raft Consensus Python Distributed

Complete Raft consensus algorithm implementation with leader election, log replication, cluster membership, and simulation framework. Includes comprehensive API reference and interactive visualizations.

View Raft Documentation โ†’

๐Ÿค– GENERATIVE AI & EMERGENT PROPERTIES

Can AI Agents Go "Rogue" Because of Emergent Properties?

Subramaniyam V. Pooni

LinkedIn Article ยท AI Safety & Alignment

AI Safety Emergent

Emergent Properties in GenAI

Subramaniyam V. Pooni

LinkedIn Article ยท LLM Research

GenAI Emergence

Holy Grail of Zero-Shot Learning

Subramaniyam V. Pooni

LinkedIn Article ยท Transfer Learning

Zero-Shot LLM

Risks Associated with Emergent Properties

Subramaniyam V. Pooni

LinkedIn Article ยท AI Risk Assessment

AI Risk Safety

Size of Model at which Emergent Properties Occur

Subramaniyam V. Pooni

LinkedIn Article ยท Scaling Laws

Scaling Emergence

In-Depth Comparison: Auto-Regressive Models vs. Masked Language Models (MLMs)

Subramaniyam V. Pooni

LinkedIn Article ยท LLM Architecture

GPT BERT MLM

๐Ÿ“ก AI-ENHANCED NETWORKING

Next Generation Networking Enhanced by AI

Subramaniyam V. Pooni

LinkedIn Article ยท Network Architecture

AI Networking SDN

Elimination of 5-Tuple Classification in Networking Using AI

Subramaniyam V. Pooni

LinkedIn Article ยท Traffic Classification

5-Tuple Flow

Introduction of New Traffic Flow Types Using AI Without Code Changes

Subramaniyam V. Pooni

LinkedIn Article ยท Adaptive Networks

Zero-Code Traffic

Tagging, Networks and AI

Subramaniyam V. Pooni

LinkedIn Article ยท Network Metadata

Tagging Metadata

Next-Generation NOC Powered by Generative AI

Subramaniyam V. Pooni

LinkedIn Article ยท Network Operations

NOC GenAI AIOps

Ethernet Scale-Up Fabrics: A Deep Dive

Subramaniyam V. Pooni

LinkedIn Article ยท Data Center Networking

Ethernet Scale-Up Fabric

๐Ÿ’ฟ AI-ENHANCED STORAGE

Building Point-in-Time Filesystem Traversal

Subramaniyam V. Pooni

LinkedIn Article ยท Filesystem Architecture

Filesystem Snapshots Time Travel
Read on LinkedIn โ†’

Next Generation Storage Enhanced by AI

Subramaniyam V. Pooni

LinkedIn Article ยท Intelligent Storage

AI Storage Smart I/O

Storage Retrieval Inspired by Ray Tracing

Subramaniyam V. Pooni

LinkedIn Article ยท Novel Architectures

Ray Tracing Retrieval

DLSS and Data Retrieval

Subramaniyam V. Pooni

LinkedIn Article ยท GPU-Inspired Storage

DLSS Upscaling

Reed-Solomon Coding and AI: Enhancing Error Correction and Data Reliability

Subramaniyam V. Pooni

LinkedIn Article ยท Erasure Coding

Reed-Solomon ECC

Dispersed Storage, AI and Lagrange's Interpolation

Subramaniyam V. Pooni

LinkedIn Article ยท Distributed Storage

Dispersed Lagrange

๐Ÿ”€ DEEP NEURAL NETWORKS & PARALLELISM

Parallelism in AI

Subramaniyam V. Pooni

LinkedIn Article ยท Distributed Training

Data Parallel Model Parallel

Model Sharding + Layer Parallelism = Model Parallelism

Subramaniyam V. Pooni

LinkedIn Article ยท Large Model Training

Sharding Pipeline

Neural Layer Parallelism (Deep Dive)

Subramaniyam V. Pooni

LinkedIn Article ยท Layer-wise Training

Layer Parallel DNN

Awesome World of Federated Learning in Terms of Global and Local Models/Sites

Subramaniyam V. Pooni

LinkedIn Article ยท Privacy-Preserving ML

Federated Privacy

Personalized Models - Combining Transfer Learning with Federated Learning

Subramaniyam V. Pooni

LinkedIn Article ยท Personalization

Transfer Federated

Deep Models, Shallow Models and Overparameterization

Subramaniyam V. Pooni

LinkedIn Article ยท Model Theory

Overparameterization

Over-Parameterization Does Not Lead to Poor Generalization

Subramaniyam V. Pooni

LinkedIn Article ยท Generalization Theory

Generalization Theory

๐Ÿ“ถ WIRELESS COMMUNICATION & DEEP LEARNING

DNDR: End-to-End Learning with Different Functionality Discovered by Gradient Descent

Subramaniyam V. Pooni

LinkedIn Article ยท Neural Communication

DNDR E2E Learning

DNDR: A Comprehensive Exploration of Perspectives in End-to-End Communication Learning

Subramaniyam V. Pooni

LinkedIn Article ยท Communication Theory

DNDR Autoencoder

DeepSig Autoencoders and Meta-Learning Systems like DNDR: A Deep Dive

Subramaniyam V. Pooni

LinkedIn Article ยท Signal Processing

DeepSig Meta-Learning

In Search of Equivalent of CNNs for Wireless Communication

Subramaniyam V. Pooni

LinkedIn Article ยท Neural Wireless

CNN Wireless

๐Ÿ”ฌ NEURAL NETWORK THEORY & TOOLS

Understanding Distillation in AI: How Models Can Be Extracted

Subramaniyam V. Pooni

LinkedIn Article ยท Knowledge Distillation

Distillation Model Compression
Read on LinkedIn โ†’

Mysterious Latent Space - Math of the 21st Century

Subramaniyam V. Pooni

LinkedIn Article ยท Representation Learning

Latent Space Embeddings

Model Order Selection

Subramaniyam V. Pooni

LinkedIn Article ยท Model Selection

AIC BIC

Neural Studio

Subramaniyam V. Pooni

LinkedIn Article ยท Development Tools

IDE Neural

Workflow for Neural Layer Splitting

Subramaniyam V. Pooni

LinkedIn Article ยท Model Optimization

Layer Split Workflow

๐Ÿ”Œ AI HARDWARE & INFRASTRUCTURE

Ray Tracing in Rust: Weekend Project with David Beazley

Subramaniyam V. Pooni

LinkedIn Post ยท Open Source Project ยท Rust vs C++ Performance

Rust Ray Tracing GPU/CUDA

Converted "Ray Tracing in One Weekend" from C++ to Rust, achieving faster performance. Explores multi-core parallelism, GPU acceleration with CUDA, and Rust optimization techniques.

Read on LinkedIn โ†’

AI Control Center

Subramaniyam V. Pooni

LinkedIn Article ยท AI Operations

AI Ops Control Plane
Read on LinkedIn โ†’

The Full Scope of FPGA, ASIC, and Hybrid Solutions in AI

Subramaniyam V. Pooni

LinkedIn Article ยท Hardware Accelerators

FPGA ASIC Hybrid

AI MicroClouds: A Deep Dive

Subramaniyam V. Pooni

LinkedIn Article ยท Edge Infrastructure

MicroCloud Edge AI

Emerging Trends in AI and Data Center Design: Examples

Subramaniyam V. Pooni

LinkedIn Article ยท Data Center Architecture

Data Center AI Infra

FaaS Platform Design

Subramaniyam V. Pooni

LinkedIn Article ยท Serverless Architecture

FaaS Serverless

๐Ÿ”’ TRADE SECRETS & PROPRIETARY RESEARCH

๐Ÿ“ก

SONiCS Platform (Futurewei)

12+ trade secrets filed for Self-Organizing Networks with intelligent Controllers. Includes FaaS architecture, MLaaS integration, and federated learning for wireless networks.

12+ Trade Secrets 2016-2019
๐Ÿง 

GAN-Based Wireless Receiver

Award-winning research on Generative Adversarial Networks for software-defined wireless receivers. Collaboration with Berkeley AI Research (BAIR) and Georgia Tech.

๐Ÿ† Top Innovation Award 2019

๐Ÿš€ UPCOMING & IN-PROGRESS

  • โ–น Agent Protocol Standardization (AP2, SLIM, X-A2A) โ€” Defining interoperability standards for multi-agent systems
  • โ–น Hardware-Aware LLM Compilation โ€” Optimizing inference for heterogeneous GPU clusters (NVIDIA + AMD)
  • โ–น Agentic RAG with Long-Context Memory โ€” Scaling agent memory for enterprise knowledge bases
  • โ–น Rust-Based Interpreter Design Patterns โ€” Documenting learnings from Crusty Lox implementation

"Building the future of intelligent systems, one agent at a time."

RESUME (PDF) RESUME (DOCX) COVER LETTER (PDF) COVER LETTER (DOCX)

3265 Delta Rd, San Jose, CA 95135 ยท US Citizen

Get in Touch