GPU-Accelerated Deep Learning Primitives • 2026
How deep learning primitives fit into the GPU computing ecosystem
How data moves through the deep learning primitive pipeline
Key features and optimizations in each library
Hardware-accelerated matrix multiplication at the heart of AI
Detailed breakdown of capabilities across both libraries
How cuDNN and MIOpen choose the optimal kernel for your workload
Conceptual comparison based on typical workloads (actual results vary by configuration)
Side-by-side code examples for common operations
// Create handles and descriptors cudnnHandle_t handle; cudnnCreate(&handle); cudnnTensorDescriptor_t xDesc, yDesc; cudnnCreateTensorDescriptor(&xDesc); cudnnSetTensor4dDescriptor(xDesc, CUDNN_TENSOR_NCHW, CUDNN_DATA_FLOAT, N, C, H, W); cudnnFilterDescriptor_t wDesc; cudnnSetFilter4dDescriptor(wDesc, CUDNN_DATA_FLOAT, CUDNN_TENSOR_NCHW, K, C, R, S); cudnnConvolutionDescriptor_t convDesc; cudnnSetConvolution2dDescriptor(convDesc, pad, pad, stride, stride, 1, 1, CUDNN_CROSS_CORRELATION, CUDNN_DATA_FLOAT); // Auto-tune: find best algorithm cudnnConvolutionFwdAlgo_t algo; cudnnGetConvolutionForwardAlgorithm_v7(handle, xDesc, wDesc, convDesc, yDesc, 1, &returnedCount, &perfResults); algo = perfResults[0].algo; // Execute convolution cudnnConvolutionForward(handle, &alpha, xDesc, x, wDesc, w, convDesc, algo, workspace, wsSize, &beta, yDesc, y);
// Create handle and descriptors miopenHandle_t handle; miopenCreate(&handle); miopenTensorDescriptor_t xDesc, yDesc; miopenCreateTensorDescriptor(&xDesc); miopenSet4dTensorDescriptor(xDesc, miopenFloat, N, C, H, W); miopenTensorDescriptor_t wDesc; miopenSet4dTensorDescriptor(wDesc, miopenFloat, K, C, R, S); miopenConvolutionDescriptor_t convDesc; miopenInitConvolutionDescriptor(convDesc, miopenConvolution, pad, pad, stride, stride, 1, 1); // Solution finding: benchmark algorithms miopenConvAlgoPerf_t perfResults; miopenFindConvolutionForwardAlgorithm(handle, xDesc, x, wDesc, w, convDesc, yDesc, y, 1, &returnedCount, &perfResults, workspace, wsSize, false); // Execute convolution miopenConvolutionForward(handle, &alpha, xDesc, x, wDesc, w, convDesc, perfResults.fwd_algo, &beta, yDesc, y, workspace, wsSize);
Key milestones in GPU deep learning primitive development