THE Universal System for AI Infrastructure
NVIDIA-certified systems from Supermicro
The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at any scale for the world's most powerful elastic data centers in AI, data analytics, and HPC. A100 is built on the NVIDIA Ampere architecture and is the driving force behind NVIDIA's data center platform. A100 delivers up to 20 times the performance of the previous generation and can be partitioned into seven GPU instances to dynamically adapt to changing requirements. A100 is available with 80 GB of memory. For the first time, the A100 80 GB uses the world's highest memory bandwidth of over 2 terabytes per second (TB/s) to handle the largest models and data sets.
A100 PCIE Product Brief (PDF 332 KB)
NVIDIA A100 datasheet (PDF 867 KB)
The most powerful end-to-end platform
for AI and HPC in the data center
NVIDIA DGX A100
DEEP LEARNING TRAINING
Up to 3 times faster AI training for the largest models

The complexity of AI models is rapidly increasing to meet new challenges such as conversational AI. Training them requires tremendous computational power and scalability.
NVIDIA A100 Tensor Cores with Tensor Float (TF32) precision provide up to 20x more performance over NVIDIA Volta, requiring no code changes to do so and providing an additional 2x boost with automatic mixed precision and FP16. Scaling to thousands of A100 GPUs is possible in combination with NVIDIA® NVLink®, NVIDIA NVSwitch™, PCI Gen4, NVIDIA® Mellanox® InfiniBand®, and the NVIDIA Magnum IO™ SDK.
Training workloads like BERT can be solved at scale with 2,048 A100 GPUs in under a minute, setting a world record for solution time.
For the largest models with massive data tables such as Deep Learning Recommendation Models (DLRM), the A100 80 GB achieves up to 1.3 TB of unified memory per node and provides up to 3 times the throughput of the A100 40 GB.
NVIDIA's leadership in MLPerf has been solidified by multiple performance records in AI training benchmarks across the industry.
LEARN MORE ABOUT A100 FOR TRAINING
INFERENCE FOR DEEP LEARNING
The A100 introduces breakthrough features to optimize inference workloads. It accelerates a wide range of precision, from FP32 to INT4, and multi-instance GPU (MIG) technology allows multiple networks to run simultaneously on a single A100 GPU for optimal use of compute resources. In addition to the A100's other inference performance enhancements, structural low density provides up to 2x more performance.
For state-of-the-art conversational AI models such as BERT, the A100 provides up to 249x faster inference throughput over CPUs.
For the most complex models with limited batch sizes, such as RNN-T for automatic speech recognition, the increased memory capacity of the A100 80GB doubles the size of any MIG, delivering 1.25 times greater throughput than the A100 40GB.
NVIDIA demonstrated market-leading performance for inference in MLPerf, and the A100 extends that lead with 20x more performance.
LEARN MORE ABOUT A100 FOR INFERENCE

A100 40 GB

HIGH-PERFORMANCE COMPUTING
To unlock the next generation of discovery, scientists are looking at simulations to better understand the world around us.
NVIDIA A100 introduces Tensor Cores with twice the precision, representing the biggest leap in performance for HPC since the introduction of GPUs. Combined with 80GB of the fastest graphics memory, researchers can reduce a previously 10-hour, double-precision simulation on A100 to less than four hours. HPC applications can also leverage TF32, achieving up to 11 times higher throughput on dense single-precision matrix multiplication tasks.
For those HPC applications with the largest data sets, the additional memory of the A100 80 GB provides up to a 2x increase in throughput in Quantum Espresso, a materials simulation. The massive memory and unmatched storage bandwidth make the A100 80 GB the ideal platform for next-generation workloads.
LEARN MORE ABOUT A100 FOR HPC


POWERFUL DATA ANALYSIS
Up to 83 times faster than on CPU, 2 times faster than A100 40 GB in Big Data Analytics benchmark

Data scientists need to be able to analyze, visualize, and gain insights from large data sets. However, scaling solutions are often held back by the fact that data sets are spread across multiple servers.
Accelerated servers with A100 deliver the compute power - along with massive memory, 2 terabytes per second (TB/s) of storage bandwidth, and scalability via NVIDIA® NVLink® and NVSwitch™ - to handle these massive workloads. Combined with InfiniBand, NVIDIA Magnum IO™, and theRAPIDS™ suite of open source libraries, including the RAPIDS Accelerator for Apache Spark for GPU-accelerated data analytics, NVIDIA's data center platform accelerates these massive workloads with unmatched performance and efficiency.
In a big data analytics benchmark, the A100 80GB achieved 83x higher throughput insights than CPUs and 2x higher performance than the A100 40GB, making it ideal for increasing workloads with ever-growing data sets.
MORE INFORMATION ON DATA ANALYTICS
COMPETITIVE WORKLOAD
BERT Large Inference

A100 withMIG optimizes the utilization of GPU-accelerated infrastructure. With MIG, an A100 GPU can be partitioned into up to seven independent instances, allowing multiple users to take advantage of GPU acceleration simultaneously. On the A100 40GB, each MIG instance can be allocated up to 5GB, and the increased memory capacity doubles this to 10GB on the A100 80GB.
MIG works with Kubernetes, containers andhypervisor-based server virtualization. MIG enables infrastructure management to assign a customized GPU to each task with guaranteed quality of service (QoS), giving each user access to accelerated computing resources.
MORE INFORMATION ABOUT MIG
NVIDIA & SUPERMICRO get
the best out of your systems
Get the most out of your systems with our recommendations from NVIDIA and Supermicro.
there's something here for every project and every budget!

NVIDIA DGX A100
The universal system for AI infrastructure
THE WORLD'S FIRST KI SYSTEM BASED ON NVIDIA A100
Watch videoDownload datasheetDownload Brochure

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

NVIDIA DGX STATION A100
Workgroup appliance for the AI era
KI DATA CENTER-IN-A-BOX
Download infographicDownload datasheet

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...
Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...


Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...


Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...


Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...


Your direct line to the experts at sysGen!
GPUs FOR LEGAL CENTRES

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...
TECHNICAL DATA
NVIDIA A100 for NVLink | NVIDIA A100 for PCIe | |
---|---|---|
Peak FP64 | 9.7 TF | 9.7 TF |
Peak FP64 Tensor Core | 19.5 TF | 19.5 TF |
Peak FP32 | 19.5 TF | 19.5 TF |
Tensor Float 32 (TF32) | 156 TF | 312 TF* | 156 TF | 312 TF* |
Peak BFLOAT16 Tensor Core | 312 TF | 624 TF* | 312 TF | 624 TF* |
Peak FP16 Tensor Core | 312 TF | 624 TF* | 312 TF | 624 TF* |
Peak INT8 Tensor Core | 624 TOPS | 1,248 TOPS * | 624 TOPS | 1,248 TOPS * |
Peak INT4 Tensor Core | 1.248 TOPS | 2,496 TOPS * | 1.248 TOPS | 2,496 TOPS * |
GPU memory | 40GB / 80GB | 40GB |
GPU memory bandwidth | 1.555 GB/s / 2.039 GB/s | 1.555 GB/s |
Connections | NVIDIA NVLink 600 GB/s**PCIe Gen4 64 GB/s | NVIDIA NVLink 600 GB/s**PCIe Gen4 64 GB/s |
Multi-instance graphics processor | Different instance sizes with up to 7 MIGs at 10 GB | Different instance sizes with up to 7 MIGs at 5 GB |
Form factor | 4/8 SXM on NVIDIA HGX™ A100 | PCIe |
Max. TDP power | 400 W / 400 W | 250W |
SXM GPUs via HGX A100 server boards; PCIe GPUs via NVLink bridge for up to 2 GPUs

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...
NVIDIA RTX Workstation GPU's

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...

Editable editable, click me for edit, editable, click me for edit, editable, click me for edit ...
NVIDIA Ampere Architecture Insights
Learn what's new
with the NVIDIA Ampere architecture and its implementation in the
NVIDIA A100 GPU.