Building Agentic LLM Systems with LangChain, Multi-Agent RAG & Agent-Based Reasoning
AP2 | SLIM | X-A2A | LLMOps | Scalable AI Inference | Compilers: MLIR, LLVM, WASM
Enterprise AI Agent Development & LLMOps
Cognitive Systems ร Scalable Behaviors
CSยฒB Technologies bridges the gap between AI research and production systems. We design, build, and operate multi-agent AI systems โ from RAG pipelines to protocol integrations โ with the observability and guardrails required for real-world enterprise deployment.
"We help enterprises deploy and operate AI agent systems reliably at scale."
Three decades of innovation across enterprise, startups, and research
Chief Architect & Founding Engineer, Agentic Intelligence
Sep 2025 โ Present ยท San Jose, CA
Principal Architect โ GenAI | HPC | SDDC
Dec 2021 โ Aug 2025 ยท 3 yrs 9 mos ยท Palo Alto, CA
Founder CTO โ Telco Cloud / 5G Open RAN
Dec 2020 โ Dec 2021 ยท 1 yr 1 mo ยท San Jose, CA
Career Break ยท Key Projects & Innovations
Sep 2019 โ Dec 2020 ยท 1 yr 4 mos ยท San Jose
Principal Lead ยท Software Competence Centre for Wireless, CTO Office
Feb 2016 โ Sep 2019 ยท 3 yrs 8 mos ยท Santa Clara
Director of Research & Development, CTO Office
Dec 2011 โ Jan 2016 ยท 4 yrs 2 mos ยท San Jose
Principal SDN Architect
Aug 2010 โ Dec 2011 ยท 1 yr 5 mos ยท SF Bay Area
Network & Storage Architect
Nov 2007 โ Aug 2010 ยท 2 yrs 10 mos ยท Folsom
Software Architect, Networking & Storage
Jul 1997 โ Dec 2007 ยท 10 yrs 6 mos ยท Roseville
Systems Software Engineer
Jul 1997 โ Dec 1999 ยท 2 yrs 6 mos ยท Pleasanton
Software Engineer, Base Operating Systems
Mar 1994 โ Jul 1997 ยท 3 yrs 5 mos ยท Beaverton, OR
11 high-impact projects across AI, HPC, and systems engineering
Broadcom ยท Dec 2020 - Aug 2025
Advanced automation with Aria 8.x, HA clustering, SAML/LDAP, RBAC. Self-service hybrid cloud across VCF, AWS, Azure, GCP.
Broadcom ยท Jan 2025 - Aug 2025
Prompt-based LLM finetuning for Mistral-7B, Phi-2, Gemma. RLHF evaluations, GPU-aware observability, model versioning.
Broadcom ยท LangChain, Triton, FastAPI ยท Jun 2024 - Aug 2025
Modular agents for RAG-backed assistants and structured extractors with full observability.
Broadcom ยท VMware PAIF-N ยท Jun 2024 - Aug 2025
VMware Private AI Foundation with NVIDIA for CSPs. Monetization: GaaS, AI PaaS, Model-as-a-Service, AI Applications.
Broadcom ยท TensorRT, vLLM, Triton ยท Jun 2024 - Aug 2025
Training/inference optimization for Llama 3.1, Mixtral, BERT on NVIDIA (B200, GH200, H100) & AMD (MI350X).
CSSQUAREDB ยท Python, Asyncio, Raft ยท Sep 2019 - Dec 2020
High-performance distributed training across multi-node clusters. Fault-tolerant key-value store with Raft consensus.
CSSQUAREDB ยท Python, LLVM, MLIR ยท Nov 2019 - Dec 2020
Custom MLIR-based compiler targeting WebAssembly. C-like DSL with LLVM IR and WASM backends.
Personal ยท Crafting Interpreters ยท Dec 2024 - Present
Tree-walking interpreter with idiomatic Rust. Scanner, recursive descent parser, tree-walk evaluator.
CSSQUAREDB ยท Python ยท Apr 2020 - Jun 2020
Modular software engineering framework. Functional, Reactive, Actor Model paradigms with test harness.
Personal ยท May 2020 - Jul 2020
JDK 9-14 features, Scala essentials, Functional Programming, Java Modules, Design Patterns exploration.
Personal ยท Rust, Graphics, Linear Algebra
3D rendering engine with ray casting, materials, lighting, and reflections.
300+ technologies across the full stack
6 patents in storage and networking technologies
GANs for Wireless
Futurewei Technologies ยท 2019
AI in Wireless Space
Huawei NJ Research ยท Dec 2018
GAN SW Receiver
Huawei NJ Research ยท Dec 2018
Outstanding AI Contributions
Huawei ยท 2017
Continuous learning in cutting-edge technologies ยท 20+ certifications
Advanced hands-on technical training programs
Software-defined networking and security
Infrastructure as code and cloud management
Core virtualization and cloud service provider platforms
Recognition for outstanding contributions and teamwork
Complete course materials from VMware training programs
Specialized training in programming and systems design
World-class foundation
Research Assistant
Computer Speech Lab ยท Indian Institute of Technology
Master's Degree
Management Information Systems ยท First Class with Distinction
Bachelor's Degree
Computer Science & Engineering ยท 1987-1991
Ongoing research contributions in AI, distributed systems, and compiler technologies
Multi-agent orchestration, ReAct patterns, tool-augmented reasoning, and autonomous AI workflows using LangChain, LangGraph, and custom agent frameworks.
High-performance inference pipelines, quantization strategies, KV-cache optimization, and hardware-aware compilation for NVIDIA and AMD accelerators.
MLIR-based compiler design, LLVM optimization passes, WebAssembly targets for edge AI, and domain-specific language development.
Subramaniyam V. Pooni
Industry Report ยท January 2026
Comprehensive analysis of the AI accelerator landscape in 2026, covering GPU, TPU, and custom ASIC architectures from NVIDIA, AMD, Intel, Google, and emerging players.
Read Full Report โSubramaniyam V. Pooni
Technical Documentation ยท 2026 ยท 6 Chapters ยท CUDA 14.x | Rubin Architecture
Comprehensive technical exploration of GPU computing architecture from PTX binaries to kernel execution. Covers CUDA compilation pipeline, GPU architecture evolution (PascalโVoltaโAmpereโHopperโBlackwellโRubin).
Explore CUDA Deep Dive โSubramaniyam V. Pooni
Technical Documentation ยท 2026 ยท 7 Chapters ยท ROCm 6.x | MI300X/MI350X
Comprehensive technical exploration of AMD's ROCm platform and CDNA architecture. Covers HIP programming, AMDGPU compiler, Instinct accelerators (MI250โMI300โMI350), and datacenter GPU deployments.
Explore ROCm Deep Dive โSubramaniyam V. Pooni
Technical Documentation ยท 2026 ยท 3 Chapters ยท TPU v7 Ironwood | JAX/XLA Stack
Comprehensive technical exploration of Google's Tensor Processing Units, XLA compiler infrastructure, and JAX programming framework. Covers TPU architecture evolution (v1โv4โv5eโv5pโTrilliumโIronwood).
Explore TPU Deep Dive โSubramaniyam V. Pooni
Technical Documentation ยท 2026 ยท 7 Chapters ยท Trainium3 | Neuron SDK | NKI
Comprehensive technical exploration of Amazon's Trainium AI accelerators, Neuron SDK compiler infrastructure, and NeuronCore architecture. Covers Trainium evolution (1โ2โ3), NKI kernel programming.
Explore Trainium Deep Dive โSubramaniyam V. Pooni
Technical Reference ยท December 2025 ยท 8 Sections ยท UEC Spec 1.0 | NVMe-oF 1.1
Comprehensive analysis of high-performance networking for AI/HPC and storage. Covers RDMA fundamentals, UEC architecture, memory operations, flow control, NVMe over Fabrics, AI collectives.
Subramaniyam V. Pooni
LinkedIn Article ยท 2025 ยท Industry Analysis
Analysis of how CXL and UEC technologies can be integrated to bridge internal memory fabric with external network fabric. Explores cache-coherent interconnects and memory pooling.
Read on LinkedIn โSubramaniyam V. Pooni
LinkedIn Article ยท 2025 ยท AI Infrastructure
Analysis of storage architecture requirements for next-generation AI workloads. Explores checkpoint/restart patterns, model weight distribution, and evolving storage hierarchy.
Read on LinkedIn โSubramaniyam V. Pooni
Technical Documentation ยท 2025 ยท 4 Chapters, 26 Sections, 1MB+
Publication-quality technical documentation on GPU-storage integration challenges. Covers NVMe queue architecture, doorbell serialization, GPUDirect Storage, CXL memory semantics.
Read Full Documentation โSubramaniyam V. Pooni
Technical Reference v3.0 ยท December 2025 ยท 13 Chapters, 10 Appendices
Memory-efficient LLM serving using CXL-based intelligent memory endpoints. Per-head tracking, EMA-based attention scoring, RoPE-aware prefetch, hardware-accelerated cache management.
Subramaniyam V. Pooni
LinkedIn Article ยท 2025 ยท AI/ML Architecture
Visual guide exploring the intersection of Graph Neural Networks and Large Language Models. Covers message passing, attention mechanisms, graph transformers, knowledge graph integration.
Read on LinkedIn โSubramaniyam V. Pooni
Working Paper ยท 2025
Proposes a scalable architecture for deploying multi-agent RAG systems in enterprise environments, addressing challenges in agent coordination, memory management, and inference optimization.
Subramaniyam V. Pooni
Technical Documentation ยท CSยฒB Technologies ยท 2025 ยท 16 Chapters
Comprehensive research document exploring MCP, A2A, and emerging standards for multi-agent communication. Deep dives into context engineering frameworks including the WSCI methodology. Features 6 protocols, 4 frameworks, and 50+ code examples.
View Agent Protocols Research โSubramaniyam V. Pooni
LinkedIn Article ยท 2025 ยท CPU Microarchitecture
Deep dive into Bridge Checkpoint Unit (BCU) microarchitecture for modern out-of-order processors. Explores checkpoint/rollback mechanisms, speculative execution recovery, register renaming.
Read on LinkedIn โSubramaniyam V. Pooni
Technical Report ยท CSSQUAREDB Technologies ยท 2020
Presents a novel MLIR-based compiler toolchain targeting WebAssembly for deploying lightweight AI models on edge devices with near-native performance.
Subramaniyam V. Pooni
Technical Report ยท CSSQUAREDB Technologies ยท 2020
Introduces a fault-tolerant parameter server architecture using Raft consensus for distributed machine learning, achieving 40%+ throughput improvement in federated learning scenarios.
Subramaniyam V. Pooni
Technical Documentation ยท CSยฒB Technologies ยท 2025 ยท 9 Phases
Comprehensive multi-target compiler infrastructure covering lexical analysis, LALR parsing, type checking, stack-based IR design, WebAssembly binary encoding, LLVM code generation, and Python bytecode emission.
View Compiler Deep Dive โSubramaniyam V. Pooni
Technical Documentation ยท CSยฒB Technologies ยท 2025 ยท Full Implementation
Complete Raft consensus algorithm implementation with leader election, log replication, cluster membership, and simulation framework. Includes comprehensive API reference and interactive visualizations.
View Raft Documentation โSubramaniyam V. Pooni
LinkedIn Article ยท AI Safety & Alignment
Subramaniyam V. Pooni
LinkedIn Article ยท LLM Research
Subramaniyam V. Pooni
LinkedIn Article ยท Transfer Learning
Subramaniyam V. Pooni
LinkedIn Article ยท AI Risk Assessment
Subramaniyam V. Pooni
LinkedIn Article ยท Scaling Laws
Subramaniyam V. Pooni
LinkedIn Article ยท LLM Architecture
Subramaniyam V. Pooni
LinkedIn Article ยท Network Architecture
Subramaniyam V. Pooni
LinkedIn Article ยท Traffic Classification
Subramaniyam V. Pooni
LinkedIn Article ยท Adaptive Networks
Subramaniyam V. Pooni
LinkedIn Article ยท Network Metadata
Subramaniyam V. Pooni
LinkedIn Article ยท Network Operations
Subramaniyam V. Pooni
LinkedIn Article ยท Data Center Networking
Subramaniyam V. Pooni
LinkedIn Article ยท Filesystem Architecture
Subramaniyam V. Pooni
LinkedIn Article ยท Intelligent Storage
Subramaniyam V. Pooni
LinkedIn Article ยท Novel Architectures
Subramaniyam V. Pooni
LinkedIn Article ยท GPU-Inspired Storage
Subramaniyam V. Pooni
LinkedIn Article ยท Erasure Coding
Subramaniyam V. Pooni
LinkedIn Article ยท Distributed Storage
Subramaniyam V. Pooni
LinkedIn Article ยท Distributed Training
Subramaniyam V. Pooni
LinkedIn Article ยท Large Model Training
Subramaniyam V. Pooni
LinkedIn Article ยท Layer-wise Training
Subramaniyam V. Pooni
LinkedIn Article ยท Privacy-Preserving ML
Subramaniyam V. Pooni
LinkedIn Article ยท Personalization
Subramaniyam V. Pooni
LinkedIn Article ยท Model Theory
Subramaniyam V. Pooni
LinkedIn Article ยท Generalization Theory
Subramaniyam V. Pooni
LinkedIn Article ยท Neural Communication
Subramaniyam V. Pooni
LinkedIn Article ยท Communication Theory
Subramaniyam V. Pooni
LinkedIn Article ยท Signal Processing
Subramaniyam V. Pooni
LinkedIn Article ยท Neural Wireless
Subramaniyam V. Pooni
LinkedIn Article ยท Knowledge Distillation
Subramaniyam V. Pooni
LinkedIn Article ยท Representation Learning
Subramaniyam V. Pooni
LinkedIn Article ยท Model Selection
Subramaniyam V. Pooni
LinkedIn Article ยท Development Tools
Subramaniyam V. Pooni
LinkedIn Article ยท Model Optimization
Subramaniyam V. Pooni
LinkedIn Post ยท Open Source Project ยท Rust vs C++ Performance
Converted "Ray Tracing in One Weekend" from C++ to Rust, achieving faster performance. Explores multi-core parallelism, GPU acceleration with CUDA, and Rust optimization techniques.
Read on LinkedIn โSubramaniyam V. Pooni
LinkedIn Article ยท AI Operations
Subramaniyam V. Pooni
LinkedIn Article ยท Hardware Accelerators
Subramaniyam V. Pooni
LinkedIn Article ยท Edge Infrastructure
Subramaniyam V. Pooni
LinkedIn Article ยท Data Center Architecture
Subramaniyam V. Pooni
LinkedIn Article ยท Serverless Architecture
12+ trade secrets filed for Self-Organizing Networks with intelligent Controllers. Includes FaaS architecture, MLaaS integration, and federated learning for wireless networks.
Award-winning research on Generative Adversarial Networks for software-defined wireless receivers. Collaboration with Berkeley AI Research (BAIR) and Georgia Tech.
"Building the future of intelligent systems, one agent at a time."
3265 Delta Rd, San Jose, CA 95135 ยท US Citizen
Get in Touch