Systeme und Informatikanwendungen Nikisch GmbHsysGen GmbH - Am Hallacker 48a - 28327 Bremen - info@sysgen.de

Welcome to the new website of sysGen. Please use our contact form if you have any questions about our content

.
KEYNOTE NOVEMBER 9 CONFERENCE & TRAINING NOVEMBER 8-11,2021REGISTER HERE
Due to the widening chip crisis and the resulting, significant price increases of the major IT manufacturers, online price calculations are currently not possible. We therefore point out that price inquiries via our website may differ from the final offer!

SPECIFICALLY DESIGNED FOR THE CONVERGENCE OF SIMULATION, DATA ANALYTICS AND AI

Massive datasets, huge Deep Learning models and complex simulations require multiple GPUs with extremely fast interconnects and a fully accelerated software stack. The NVIDIA HGX™ AI supercomputing platform combines the full power of NVIDIA GPUs, NVIDIA® NVLink®, NVIDIA InfiniBand networking, and a fully optimised NVIDIA AI and HPC software stack from the NVIDIA NGC™ catalogue for the highest application performance. With end-to-end performance and flexibility, NVIDIA HGX enables researchers and scientists to combine simulation, data analytics and AI to drive scientific progress.

UNRIVALLED END-TO-END PLATFORM FOR ACCELERATED COMPUTING

NVIDIA HGX represents the world's most powerful servers with NVIDIA A100 Tensor Core GPUs and high-speed interconnects. With 16 A100 GPUs, HGX A100 offers up to 1.3 terabytes (TB) of graphics memory and over 2 terabytes per second (Tb/s) of memory bandwidth, delivering unprecedented acceleration.

Compared to previous generations, HGX delivers up to 20x AI acceleration with Tensor Float 32 (TF32) and HPC delivers 2.5x acceleration with FP64. NVIDIA HGX delivers a staggering 10 petaFLOPS, making it the most powerful accelerated and vertically scalable server platform for AI and HPC.

The HGX has been extensively tested and is easy to deploy. It integrates with partner servers for guaranteed performance. The HGX platform is available in both 4-GPU and 8-GPU HGX motherboards with SXM GPUs. It is also available as PCIe GPUs for a modular deployment option that delivers the highest compute performance on mainstream servers.

DEEP LEARNING PERFORMANCE

UP TO 3 TIMES FASTER AI TRAINING
FOR THE LARGEST MODELS

DLRM-Training
DLRM on HugeCTR framework, precision = FP16 | NVIDIA A100 80 GB batch size = 48 | NVIDIA A100 40 GB batch size = 32 | NVIDIA V100 32 GB batch size = 32.
The size and complexity of Deep Learning models have exploded, requiring systems with large amounts of memory, massive processing power and fast interconnects for scalability. With extremely fast multilateral GPU communication through the NVIDIA® NVSwitch™, HGX A100 provides enough power for even the most advanced AI models. A100 80GB GPUs double the graphics memory, allowing a single HGX A100 to provide up to 1.3TB of memory. Steadily growing workloads on the very largest models, such as Deep Learning Recommendation Models (DLRM), which have massive data tables, are accelerated by up to 3 times over the performance of HGX systems with A100 40 GB GPUs.

PERFORMANCE OF MACHINE LEARNING

2 TIMES FASTER THAN A100 40 GB IN BIG DATA ANALYTICS BENCHMARK

Big Data Analytics Benchmark | 30 analytical trade queries, ETL, ML, NLP on 10 TB dataset | V100 32 GB, RAPIDS/Dask | A100 40 GB and A100 80 GB, RAPIDS/Dask/BlazingSQL

Machine learning models require loading, transforming and processing very large datasets to gain insights. With over 1.3TB of unified memory and multilateral GPU communication via NVSwitch, HGX 80GB has the power to load and perform computations on huge datasets to quickly gain actionable insights.

In a big data analytics benchmark, the A100 80 GB achieved insights with 2x the throughput of the A100 40 GB, making it ideal for increasing workloads with ever-growing data sets.

HPC PERFORMANCE

HPC applications need to perform enormous amounts of calculations every second. By drastically increasing the computational density of each server node, the number of servers required is significantly reduced. This leads to major cost savings and reduces the space and energy requirements in data centres. For HPC simulations and the associated high-dimensional matrix multiplication, a processor must retrieve data from many environments for computation. Therefore, interconnecting GPUs through NVLink is ideal. HPC applications can also use TF32 in A100, achieving up to 11 times higher throughput in four years for dense matrix multiplication tasks with single precision

An HGX A100 with A100 80 GB GPUs provides a two-fold increase in throughput over A100 40 GB GPUs in Quantum Espresso,
​​​​​​​a material simulation, leading to faster insight.

11 TIMES MORE POWER AT HPC IN FOUR YEARS
​​​​​​​

Leading HPC applications
Geometric mean of application acceleration vs. P100: benchmark application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec], MILC [Apex Medium], NAMD [stmv_nve_cuda], PyTorch (BERT Fast Fine Tuning], Quantum Espresso [AUSURF112-jR]; Random Forest FP32 [make_blobs (160000 x 64: 10)], TensorFlow [ResNet-50], VASP 6 [Si Huge] | GPU nodes with dual-socket CPUs with 4x NVIDIA P100, V100 or A100 GPUs.

UP TO 1.8 TIMES FASTER PERFORMANCE
FOR HPC APPLICATIONS

Quantum Espresso​
Quantum Espresso measurement with CNT10POR8 dataset, precision = FP64.

TECHNICAL DATA FOR HGX A100

NVIDIA HGX is available as a single motherboard with four or eight A100 GPUs, each with 40GB or 80GB of GPU memory. The 4 GPU configuration is fully connected with NVIDIA NVLink® and the 8 GPU configuration is interconnected via NVSwitch. Two NVIDIA HGX A100 motherboards can be combined with an NVSwitch connection to create a powerful single node with 16 GPUs.

HGX is also available in a PCIe form factor as an easy-to-deploy option that delivers the highest compute performance on mainstream servers
​​​​​​​with 40GB or 80GB of GPU memory each.

This powerful combination of hardware and software lays the foundation for the ultimate AI supercomputing platform.

A100 PCIe
4-GPU
8-GPU
16-GPU
GPUs
1x NVIDIA A100 PCIe
HGX A100 4-GPU
HGX A100 8-GPU
2x HGX A100 8-GPU
Form factor
PCIe
4x NVIDIA A100 SXM
8x NVIDIA A100 SXM
16x NVIDIA A100 SXM
HPC and AI calculations
​​​​​​​(FP64/TF32*/FP16*/INT8*
19.5TF/312TF*/624TF*/1.2POPS*
78TF/1.25PF*/2.5PF*/5POPS*
156TF/2.5PF*/5PF*/10POPS*
312TF/5PF*/10PF*/20POPS*
RAM
40 or 80 GB per GPU
Up to 320 GB
Up to 640 GB
Up to 1,280 GB
NVLink
Third generation
Third generation
Third generation
Third generation
NVSwitch
N/A
N/A
Second generation
Second generation
NVSwitch bandwidth for connections between GPUs
N/A
N/A
600GB/s
600GB/s
Total aggregated bandwidth
600GB/s
2,4 TB/s
4,8 TB/s
9,6 TB/s
* With low density

TECHNICAL DATA FOR HGX A100

With HGX, it is also possible to incorporate NVIDIA networks to accelerate and offload data transfers and ensure full utilisation of computing resources. Smart adapters and switches reduce latency, increase efficiency, enhance security and simplify data centre automation, accelerating the performance of end-to-end applications.

The data centre is the compute unit of the future and HPC networks play an essential role in scaling application performance across the data centre. NVIDIA InfiniBand paves the way with software-defined networking, in-network computing acceleration modules, remote direct memory access (RDMA), and the highest speeds and fastest data feeds.

INSIGHT INTO THE NVIDIA AMPERE ARCHITECTURE

Read this technical paper and learn,
what's new with the NVIDIA Ampere architecture and its implementation in the NVIDIA A100 GPU.
LEARN MORE