NVIDIA HGX-AI-SUPERCOMPUTER

Specifically designed for the convergence of simulation,
data analytics and AI

Massive data sets, huge models in Deep Learning, and complex simulations require multiple GPUs with extremely fast interconnects and a fully accelerated software stack. The NVIDIA HGX™ AI supercomputing platform combines the full power of NVIDIA GPUs, NVIDIA® NVLink®, NVIDIA InfiniBand networking, and a fully optimized NVIDIA AI and HPC software stack from the NVIDIA NGC™ catalog for the highest application performance. With end-to-end performance and flexibility, NVIDIA HGX enables researchers and scientists to combine simulation, data analytics, and AI to drive scientific progress.

Product Name XYZ - Mostly Longer than short

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

This could be a description of a product, maybe you want to have that but it will be shortened to a certain amount of characters ...

9.999,99 €*

Product Name XYZ - Mostly Longer than short

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

This could be a description of a product, maybe you want to have that but it will be shortened to a certain amount of characters ...

9.999,99 €*

Product Name XYZ - Mostly Longer than short

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

This could be a description of a product, maybe you want to have that but it will be shortened to a certain amount of characters ...

9.999,99 €*

Product Name XYZ - Mostly Longer than short

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

This could be a description of a product, maybe you want to have that but it will be shortened to a certain amount of characters ...

9.999,99 €*

UNMATCHED END-TO-END PLATFORM FOR ACCELERATED COMPUTING

NVIDIA HGX represents the world's most powerful servers with NVIDIA A100 Tensor Core GPUs and high-speed interconnects. With 16 A100 GPUs, HGX A100 delivers up to 1.3 terabytes (TB) of graphics memory and over 2 terabytes per second (Tb/s) of memory bandwidth, achieving unprecedented acceleration.

Compared to previous generations, HGX delivers up to 20x AI acceleration with Tensor Float 32 (TF32) and HPC delivers 2.5x acceleration with FP64. NVIDIA HGX performs a staggering 10 petaFLOPS, making it the most powerful accelerated and vertically scalable server platform for AI and HPC.

The HGX has been extensively tested and is easy to deploy. It integrates with partner servers for guaranteed performance. The HGX platform is available in both 4-GPU and 8-GPU HGX motherboards with SXM GPUs. It is also available as PCIe GPUs for a modular deployment option that delivers the highest compute performance on mainstream servers.

HGX Application and Software Package Image

DATA SHEET

DEEP LEARNING PERFORMANCE

With 144 cores and 1 TB/s of memory bandwidth, the NVIDIA Grace CPU Superchip delivers unprecedented performance for CPU-based high-performance computing applications. HPC applications are compute-intensive and require the most powerful cores, the highest memory bandwidth, and the right amount of memory per core to accelerate results.

Grace CPU Superchip and Grace Hopper Superchip are expected to be available in the first half of 2023.

UP TO 3 TIMES FASTER KI TRAINING ON THE
LARGEST MODELS

DLRM Training

DLRM on HugeCTR framework, precision = FP16 | NVIDIA A100 80 GB batch size = 48 | NVIDIA A100 40 GB batch size = 32 | NVIDIA V100 32 GB batch size = 32.

The size and complexity of Deep Learning models have exploded, requiring systems with large amounts of memory, massive processing power, and fast interconnects for scalability. With ultra-fast multilateral GPU communication through NVIDIA® NVSwitch™, HGX A100 provides enough power for even the most advanced AI models. A100 80 GB GPUs double the graphics memory, allowing a single HGX A100 to provide up to 1.3 TB of memory. Steadily growing workloads on the very largest models, such as Deep Learning Recommendation Models (DLRM), which have massive data tables, are accelerated by up to 3x over the performance of HGX systems with A100 40 GB GPUs.

PERFORMANCE OF MACHINE LEARNING

2 times faster than A100 40 GB in Big Data Analytics benchmark

DLRM Training

Big Data Analytics benchmark | 30 Analytical Retail Queries, ETL, ML, NLP on 10 TB dataset | V100 32 GB, RAPIDS/Dask | A100 40 GB and A100 80 GB, RAPIDS/Dask/BlazingSQL

Machine learning models require loading, transforming, and processing very large datasets to gain insights. With over 1.3TB of unified memory and multilateral GPU communication via NVSwitch, HGX 80 GB has the power to load and perform computations on huge datasets to quickly gain actionable insights.

In a big data analytics benchmark, the A100 80 GB achieved insights with 2x higher throughput than the A100 40 GB, making it ideal for increasing workloads with ever-growing data sets.

HPC PERFORMANCE

HPC applications need to perform enormous amounts of computation every second. By dramatically increasing the compute density of each server node, the number of servers required is significantly reduced. This leads to large cost savings and reduces the space and energy requirements in data centers. For HPC simulations and the associated high-dimensional matrix multiplication, a processor must retrieve data from many environments for computation. Therefore, interconnecting GPUs through NVLink is ideal. HPC applications can also leverage TF32 in A100, achieving up to 11 times higher throughput in four years for dense matrix multiplication tasks with single precision

An HGX A100 with A100 80 GB GPUs provides a two-fold increase in throughput over A100 40 GB GPUs in Quantum Espresso, a materials simulation, leading to faster insight.

11 times more performance at HPC in four years

Leading HPC applications

HGX Throughput Performance Comparison Image

Geometric mean of application acceleration vs. P100: benchmark application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec], MILC [Apex Medium], NAMD [stmv_nve_cuda], PyTorch (BERT Fast Fine Tuning], Quantum Espresso [AUSURF112-jR]; Random Forest FP32 [make_blobs (160000 x 64:10)], TensorFlow [ResNet-50], VASP 6 [Si Huge] | GPU nodes with dual-socket CPUs with 4x NVIDIA P100, V100 or A100 GPUs.

Up to 1.8 times faster performance for
HPC applications

Quantum espresso

Quantum Espresso measurement with CNT10POR8 dataset, precision = FP64.

TECHNICAL DATA FOR HGX A100

NVIDIA HGX is available as a single motherboard with four or eight A100 GPUs, each with 40 GB or 80 GB of GPU memory. The 4 GPU configuration is fully connected with NVIDIA NVLink®, and the 8 GPU configuration is interconnected via NVSwitch. Two NVIDIA HGX A100 motherboards can be combined with NVSwitch interconnect to create a high-performance single node with 16 GPUs.

HGX is also available in a PCIe form factor as an easy-to-deploy option that delivers the highest compute performance on mainstream servers with 40GB or 80GB of GPU memory each.

This powerful combination of hardware and software lays the foundation for the ultimate AI supercomputing platform.

	A100 PCIe	4-GPU	8-GPU	16-GPU
GPUs	1x NVIDIA A100 PCIe	HGX A100 4-GPU	HGX A100 8-GPU	2x HGX A100 8-GPU
Form factor	PCIe	4x NVIDIA A100 SXM	8x NVIDIA A100 SXM	16x NVIDIA A100 SXM
HPC and AI computations (FP64/TF32/FP16/INT8*	19.5TF/312TF/624TF/1.2POPS*	78TF/1.25PF/2.5PF/5POPS*	156TF/2.5PF/5PF/10POPS*	312TF/5PF/10PF/20POPS*
Working memory	40 or 80 GB per GPU	Up to 320 GB	Up to 640 GB	Up to 1,280 GB
NVLink	Third generation	Third generation	Third generation	Third generation
NVSwitch	N/A	N/A	Second generation	Second generation
NVSwitch bandwidth for connections between GPUs	N/A	N/A	600GB/s	600GB/s
Total aggregated bandwidth	600GB/s	2.4 TB/s	4.8 TB/s	9.6 TB/s

* With low density

INSIGHT INTO THE NVIDIA AMPERE ARCHITECTURE

Read this technical paper to learn what's new with the NVIDIA Ampere architecture and
it's implementation in the NVIDIA A100 GPU.

LEARN MORE

NVIDIA HGX-KI SUPERCOMPUTER

Specifically designed for the convergence of simulation,data analytics and AI

UNMATCHED END-TO-END PLATFORM FOR ACCELERATED COMPUTING

DEEP LEARNING PERFORMANCE

UP TO 3 TIMES FASTER KI TRAINING ON THELARGEST MODELS

PERFORMANCE OF MACHINE LEARNING

2 times faster than A100 40 GB in Big Data Analytics benchmark

HPC PERFORMANCE

11 times more performance at HPC in four years

Up to 1.8 times faster performance forHPC applications

TECHNICAL DATA FOR HGX A100

INSIGHT INTO THE NVIDIA AMPERE ARCHITECTURE

Specifically designed for the convergence of simulation,
data analytics and AI

UP TO 3 TIMES FASTER KI TRAINING ON THE
LARGEST MODELS

Up to 1.8 times faster performance for
HPC applications