The Building Blocks of Advanced Multi-GPU Communication


                  How NVLink and NVSwitch Work Together

                  NVIDIA NVLink

                  NVIDIA A100 PCIe with NVLink GPU-to-GPU connection
                  NVIDIA A100 with NVLink GPU-to-GPU connections

                  NVIDIA NVSwitch

                  The NVSwitch topology diagram

                  Maximizing System Throughput

                  Third-Generation NVLINK

                  NVIDIA NVLink technology addresses interconnect issues by providing higher bandwidth, more links, and improved scalability for multi-GPU system configurations. A single NVIDIA A100 Tensor Core GPU supports up to 12 third-generation NVLink connections for a total bandwidth of 600 gigabytes per second (GB/sec)—almost 10X the bandwidth of PCIe Gen 4. 

                  Servers like the NVIDIA DGX? A100 take advantage of this technology to deliver greater scalability for ultrafast deep learning training. NVLink is also available in A100 PCIe two-GPU configurations.  

                  NVLink Performance

                  NVLink in NVIDIA A100

                  NVIDIA NVSwitch

                  NVSwitch—The Fully Connected NVLink

                  The rapid adoption of deep learning has driven the need for a faster, more scalable interconnect, as PCIe bandwidth often creates a bottleneck at the multi-GPU-system level. For deep learning workloads to scale, dramatically higher bandwidth and reduced latency are needed.

                  NVIDIA NVSwitch builds on the advanced communication capability of NVLink to solve this problem. It takes deep learning performance to the next level with a GPU fabric that enables more GPUs in a single server and full-bandwidth connectivity between them. Each GPU has 12 NVLinks per NVSwitch to enable high-speed, all-to-all communication.

                  The Most Powerful End-to-End AI and HPC Data Center Platform

                  NVLink and NVSwitch are essential building blocks of the complete NVIDIA data center solution that incorporates hardware, networking, software, libraries, and optimized AI models and applications from NGC?. The most powerful end-to-end AI and HPC platform, it allows researchers to deliver real-world results and deploy solutions into production, driving unprecedented acceleration at every scale.

                  Full Connection for Unparalleled Performance

                  NVSwitch is the first on-node switch architecture to support eight to 16 fully connected GPUs in a single server node. The second-generation NVSwitch drives simultaneous communication between all GPU pairs at an incredible 600 GB/s. It supports full all-to-all communication with direct GPU peer-to-peer memory addressing. These 16 GPUs can be used as a single high-performance accelerator with unified memory space and up to 10 petaFLOPS of deep learning compute power.


                  • NVIDIA NVLink

                    NVIDIA NVLink

                  • NVIDIA NVSwitch

                    NVIDIA NVSwitch

                    Second Generation Third Generation
                  Total NVLink Bandwidth 300 GB/s 600 GB/s
                  Maximum Number of Links per GPU 6 12
                  Supported NVIDIA Architectures NVIDIA Volta NVIDIA Ampere Architecture
                    First Generation Second Generation
                  Number of GPUs with Direct Connection Up to 16 Up to 16
                  NVSwitch GPU-to-GPU Bandwidth 300 GB/s 600 GB/s
                  Total Aggregate Bandwidth 4.8 TB/s 9.6 TB/s
                  Supported NVIDIA Architectures NVIDIA Volta NVIDIA Ampere Architecture

                  Get Started

                  Experience NVIDIA DGX A100, the universal system for AI infrastructure and the world’s first AI system built on the NVIDIA A100 Tensor Core GPU.