ACCELERATING THE MOST IMPORTANT WORK OF OUR TIME
The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at any scale for AI, data analytics and high-performance computing (HPC) to tackle the world's toughest computing challenges. As the engine of the NVIDIA data centre platform, the A100 can efficiently scale to thousands of GPUs or be partitioned into seven GPU instances with NVIDIA Multi-Instance GPU (MIG) technology to accelerate workloads of any size. And third-generation Tensor Cores accelerate any precision for a wide range of workloads, reducing time to insight and time to market.
THE MOST POWERFUL END-TO-END KI AND HPC DATA CENTRE PLATFORM
A100 is part of NVIDIA's complete data centre solution, which includes building blocks for NGC™ hardware, networking, software, libraries and optimised AI models and applications. The most powerful end-to-end AI and HPC platform for data centres enables researchers to deliver real-world results and deploy solutions at scale in production.
SPECIFICATION
NVIDIA A100 for NVLink | |
---|---|
Peak FP64 | 9.7 TF |
Peak FP64 | Tensor Core 19.5 TF |
Peak FP32 | 19.5 TF |
Peak FP32 | Tensor Core 156 TF | 312 TF* |
Peak BFLOAT16 | Tensor Core 312 TF | 624 TF* |
Peak FP16 | Tensor Core 312 TF | 624 TF* |
Peak INT8 | Tensor Core 624 TOPS | 1,248 TOPS* |
Peak INT4 | Tensor Core 1,248 TOPS | 2,496 TOPS* |
GPU Memory | 40 GB |
GPU Memory | Bandwidth 1,555 GB/s |
IInterconnect | NVIDIA NVLink 600 GB/s |
PCIe Gen4 | 64 GB/s |
Multi-instance GPUs | Various instance sizes with up to 7MIGs @5GB |
Form Factor | 4/8 SXM on NVIDIA HGX™ A100 |
Max TDP | Power 400W |
UP TO 6X HIGHER POWER WITH TF32 FOR KI TRAINING

DEEP LEARNING TRAINING
AI models are becoming more complex as they face the next challenges, such as accurate conversational AI and deep recommendation systems. Training these models requires massive computing power and scalability.
The third-generation NVIDIA A100 Tensor Cores with Tensor Float (TF32) precision deliver up to 20x performance over the previous generation with no code changes and an additional 2x increase with automatic mixed precision and FP16. Combined with third-generation NVIDIA® NVLink®, NVIDIA NVSwitch™, PCI Gen4, NVIDIA Mellanox InfiniBand and the NVIDIA Magnum IO™ Software SDK, scaling to thousands of A100 GPUs is possible. This means that large AI models such as BERT can be trained on a cluster of 1,024 A100s in as little as 37 minutes, offering unprecedented performance and scalability.
NVIDIA's leadership in training was demonstrated in MLPerf 0.6, the first industry-wide benchmark for AI training.
DEEP LEARNING INFERENZ
The A100 offers groundbreaking new features to optimise inference workloads. It offers unprecedented versatility by accelerating a full range of accuracies, from FP32 to FP16 and INT8 down to INT4. Multi-Instance GPU (MIG) technology enables multiple nets to run simultaneously on a single A100 GPU for optimal use of compute resources. Structural sparsity support provides up to 2x performance improvement, in addition to the other performance enhancements of A100 Inference.
NVIDIA already delivers market-leading inference performance, as demonstrated by the first industry-wide benchmark for inference, MLPerf Inference 0.5. A100 brings 20x performance to further extend this leadership.
UP TO 7X HIGHER PERFORMANCE WITH MULTI-INSTANCE GPU (MIG) FOR KI INFERENCE

9X MORE HPC PERFORMANCE IN 4 YEARS

HIGH PERFORMANCE COMPUTING (HPC)
To enable next-generation discoveries, scientists are looking to simulations to better understand complex molecules for drug discovery, physics for potential new energy sources and atmospheric data to better predict and prepare for extreme weather patterns.
The A100 introduces double precision tensor cores, marking the biggest milestone since the introduction of double precision computing in GPUs for HPC. It allows researchers to reduce a 10-hour double-precision simulation running on NVIDIA V100 Tensor Core GPUs to just four hours on the A100. HPC applications can also take advantage of the TF32 precision in the A100's Tensor Cores to achieve up to 10x higher throughput for single-precision dense matrix multiplication operations.
HIGH PERFORMANCE DATA ANALYTICS
Customers need to be able to analyse, visualise and turn huge amounts of data into insights. But scale-out solutions often get bogged down because these data sets are scattered across multiple servers.
Accelerated servers with A100 provide the compute power needed - along with 1.6 terabytes per second (TB/s) of storage bandwidth and scalability with NVLink and third-generation NVSwitch - to handle these massive workloads. Combined with NVIDIA Mellanox InfiniBand, the Magnum IO SDK and the RAPIDS suite of open source software libraries, including RAPIDS Accelerator for Apache Spark for GPU-accelerated data analytics, the NVIDIA data centre platform is uniquely capable of accelerating these massive workloads with unprecedented performance and efficiency.

7X HIGHER INFERENCE THROUGHPUT WITH MULTI-INSTANCE GPU (MIG)

CORPORATE USE
A100 with MIG maximises the use of GPU-accelerated infrastructure like never before. With MIG, an A100 GPU can be partitioned into up to seven independent instances, giving multiple users access to GPU acceleration for their applications and development projects. MIG works with Kubernetes, containers and hypervisor-based server virtualisation with NVIDIA Virtual Compute Server (vComputeServer). MIG enables infrastructure managers to offer a right-sized GPU for each job with guaranteed quality of service (QoS), optimising utilisation and extending the reach of accelerated compute resources to each user.