ACCELERATING THE MOST IMPORTANT WORK OF OUR TIME

The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at any scale for the world's highest-performing elastic data centers in AI, data analytics, and HPC. A100 is based on the NVIDIA Ampere architecture and is the core component of NVIDIA's data center platform. A100 delivers up to 20 times the performance of the previous generation and can be partitioned into seven GPU instances to dynamically adapt to changing requirements. For the first time, A100 80 GB uses the world's highest memory bandwidth of over 2 terabytes per second (TB/s) to handle the largest models and datasets.

ENTERPRISE-READY SOFTWARE FOR KI

The NVIDIA EGX™ platform includes optimized software that enables accelerated computing across the infrastructure. NVIDIA AI Enterprise provides enterprises with a cloud-native end-to-end AI and data analytics software suite that is optimized, certified, and supported by NVIDIA to run on VMware vSphere with NVIDIA certified systems. NVIDIA AI Enterprise includes key NVIDIA supporting technologies for rapidly deploying, managing, and scaling AI workloads in the modern hybrid cloud.
NVIDIA AI Enterprise

THE MOST POWERFUL END-TO-END DATA CENTER PLATFORM
FOR KI AND HPC

A100 is part of NVIDIA's complete data center solution, which includes building blocks for NGC™ hardware, networking, software, libraries, and optimized AI models and applications. It represents the most powerful end-to-end AI and HPC platform for data centers, enabling researchers to deliver realistic results and deploy solutions at scale.

UP TO 3 TIMES FASTER KI TRAINING WITH THE LARGEST MODELS

GPU KI-Training Performance Chart
DLRM on HugeCTR framework, precision = FP16 | NVIDIA A100 80GB batch size = 48 | NVIDIA A100 40GB batch size = 32 | NVIDIA V100 32GB batch size = 32.
DLRM Training

DEEP LEARNING TRAINING

The complexity of AI models is increasing rapidly to meet new challenges such as conversational AI. Training them requires tremendous computational power and scalability.

NVIDIA A100's tensor compute units with Tensor Float(TF32) precision provide up to 20x more performance over NVIDIA Volta, requiring no code changes to do so and providing an additional 2x boost with automatic mixed precision and FP16. When combined with NVIDIA® NVLink®, NVIDIA NVSwitch™, PCI Gen4, NVIDIA® Mellanox® InfiniBand®, and the NVIDIA Magnum IO™ SDK, scaling to thousands of A100 GPUs is possible.

Training workloads like BERT can be solved at scale with 2,048 A100 GPUs in under a minute, setting a world record for solution time.

For the largest models with massive data tables such as deep learning recommendation models (DLRMs), the A100 80 GB achieves up to 1.3 TB of unified memory per node and provides up to 3 times more throughput than the A100 40 GB.

NVIDIA's leadership in MLPerf has been solidified by multiple performance records in AI training benchmarks across the industry.

INFERENCE FOR DEEP LEARNING

The A100 offers breakthrough features to optimize inference workloads. It accelerates FP32 to INT4 and enables simultaneous execution of multiple nets on one GPU for optimal resource utilization through multi-instance GPU(MIG) technology. Structural sparsity delivers up to 2x more performance. Inference throughput for AI models like BERT is 249 times faster than on CPUs. Increased memory capacity of 80GB for complex models like RNN-T doubles the size of each MIG and delivers 1.25 times more throughput than A100 40GB. NVIDIA is the market leader in inference performance in MLPerf, the A100 extends that lead with 20x more performance.

uP TO 249 TIMES HIGHER PERFORMANCE WITH KI INFERENCE
COMPARED TO
CPUS

BERT-LARGE Inference
GPU CPU Comparison Performance Chart

BERT-Large Inference | CPU only: Xeon Gold 6240 @ 2.60 GHz, precision = FP32, batch size = 128 | V100: NVIDIA TensorRT (TRT) 7.2, precision =™ INT8, batch size = 256 | A100 40GB and 80GB, batch size = 256, precision = INT8 with sparsity.

UP TO 1.25 TIMES HIGHER PERFORMANCE WITH KI INFERENCE
COMPARED TO A100 40 GB

RNN-T Inference: Single Stream
GPU CPU Comparison Performance Chart

MLPerf 0.7 RNN-T measured with (1/7) MIG slices. Framework: TensorRT 7.2, dataset = LibriSpeech, precision = FP16.

HIGH-PERFORMANCE COMPUTING

Scientists use simulations with NVIDIA A100 to make discoveries accessible and understand the world. A100 introduces double precision tensor compute units and 80GB of the fastest graphics memory. This reduces simulations to less than four hours and provides 11x throughput on dense matrix multiplication tasks. With additional memory and unmatched bandwidth, the A100 80 GB is ideal for next-generation workloads.Learn more about the A100 for HPC

11 times more HPC performance in four years

Leading HPC applications

GPU HPC Application Performance Chart

Geometric mean of application acceleration vs. P100: benchmark application: Amber [PME-Cellulose_NVE], Chroma [szscl21_24_128], GROMACS [ADH Dodec], MILC [Apex Medium], NAMD [stmv_nve_cuda], PyTorch (BERT Fast Fine Tuning], Quantum Espresso [AUSURF112-jR]; Random Forest FP32 [make_blobs (160000 x 64:10)], TensorFlow [ResNet-50], VASP 6 [Si Huge] | GPU nodes with dual-socket CPUs with 4x NVIDIA P100, V100 or A100 GPUs.

Up to 1.8 times higher performance for HPC applications

Quantum espresso

GPU HPC Application Performance Chart

Quantum Espresso measured using CNT10POR8 dataset, precision = FP64.

POWERFUL DATA ANALYSIS

Data scientists need to analyze, visualize, and gain insights from data. Horizontal scaling solutions often don't work because data is spread across multiple servers. NVIDIA's A100 servers accelerate workloads with compute power, massive memory, 2TB/s storage bandwidth, and scalability. Together with InfiniBand, NVIDIA Magnum IO and RAPIDS suite, NVIDIA's data center platform accelerates massive workloads with unmatched performance and efficiency. The A100 delivers insights in a Big Data benchmark with 83x higher throughput and 2x higher performance than A100 40GB and is ideal for new workloads with immense data sets.


Learn more about data analytics

Up to 83x faster than CPU, 2x faster than A100 40 GB in Big Data analytics benchmark

Big Data analytics benchmark | 30 analytics trade queries, ETL, ML, NLP on 10 TB dataset | CPU: Intel Xeon Gold 6252 2.10 GHz, Hadoop | V100 32 GB, RAPIDS/Dask | A100 40 GB and A100 80 GB, RAPIDS/Dask/BlazingSQL

2X faster than A100 40GB in Big Data Analytics benchmark

Big data analytics benchmark | 30 analytical retail queries, ETL, ML, NLP on 10TB dataset | V100 32GB, RAPIDS/Dask | A100 40GB and A100 80GB, RAPIDS/Dask/BlazingSQL

COMPETITIVE WORKLOAD

7 TIMES HIGHER INFERENCE THROUGHPUT WITH MULTI-INSTANCE GRAPHICS PROCESSOR (MIG)

BERT Large Inference

GPU Performance Chart

BERT Large Inference | NVIDIA TensorRT™ (TRT) 7.1 | NVIDIA T4 Tensor Core GPU: TRT 7.1, precision = INT8, batch size = 256 | V100: TRT 7.1, precision = FP16, batch size = 256 | A100 with 1 or 7 MIG instances of 1g.5gb: batch size = 94, precision = INT8 with sparsity.

A100 with MIG optimizes the utilization of GPU-accelerated infrastructure. With MIG, an A100 GPU can be partitioned into up to seven independent instances, allowing multiple users to benefit from GPU acceleration simultaneously. On the A100 40 GB, each MIG instance can be allocated up to 5 GB, and the increased memory capacity doubles this to 10 GB on the A100 80 GB.

MIG works with Kubernetes, containers and hypervisor-based server virtualization. MIG enables infrastructure management to assign a customized GPU to each task with guaranteed quality of service (QoS), giving every user access to accelerated computing resources.


Learn more about MIG

Transform your AI workloads with the NVIDIA DGX A100

Experience the power of AI acceleration with the NVIDIA DGX A100. This leading AI platform delivers exceptional performance to accelerate and simplify complex AI workloads. The DGX A100 features the latest NVIDIA A100 Tensor Core GPU technology that delivers amazing performance and efficiency for Deep Learning, HPC, and Analytics workloads. Use the DGX A100 to achieve your AI goals and take your business to the next level.
DGX A100 System

GPUS FOR DATA CENTERS

NVIDIA A100 HGX

NVIDIA A100 FOR HGX

Ultimate performance for all workloads.
NVIDIA A100 PCIe

NVIDIA A100 FOR PCIE

Maximum versatility for all workloads.

TECHNICAL DATA

A100 80 GB PCIe
A100 80 GB SXM
FP64
9.7 TFLOPS
9.7 TFLOPS
FP64 tensor core
19.5 TFLOPS
19.5 TFLOPS
FP32
19.5 TFLOPS
19.5 TFLOPS
Tensor Float 32 (TF32)
156 TFLOPS | 312 TFLOPS*
156 TFLOPS | 312 TFLOPS*
BFLOAT16 tensor computing unit
312 TFLOPS | 624 TFLOPS*
312 TFLOPS | 624 TFLOPS*
FP16 tensor computing unit
312 TFLOPS | 624 TFLOPS*
312 TFLOPS | 624 TFLOPS*
INT8 tensor computing unit
624 TOPS | 1248 TOPS*
624 TOPS | 1248 TOPS*
GPU memory
80 GB HBM2e
80 GB HBM2e
GPU memory bandwidth
1.935 GB/s
2.039 GB/s
Max. Thermal Design Power (TDP)
300 W
400 W ***
Multi-instance GPU
Up to 7 MIGs with 10 GB
Up to 7 MIGs with 10 GB
Form factor
PCIe - two slots with air cooling or one slot with liquid cooling
SXM
Interconnection
NVIDIA® NVLink® bridge
for 2 GPUs: 600 GB/s **
PCIe Gen4: 64 GB/s
NVLink: 600 GB/s
PCIe Gen4: 64 GB/s
Server options
NVIDIA-Certified Systems™ with 1-8 GPUs
NVIDIA HGX™ A100 partners and NVIDIA certified systems with 4, 8 or 16 GPUs NVIDIA DGX™ A100 with 8 GPUs
*with Sparsity
** SXM4 GPUs via HGX A100 server boards, PCIe GPUs via NVLink bridge for up to two GPUs
*** 400 W TDP for standard configuration. SKU for HGX A100-80 GB Custom Thermal Solution (CTS) can support TDPs up to 500 W