THE HEART OF KI AND HPC IN THE MODERN DATA CENTER

Solving the world's most important scientific, industrial, and business challenges with AI and HPC. Visualizing complex content to develop innovative products, tell compelling stories, and reinvent the cities of the future. Extracting new insights from massive data sets. Designed for the age of Elastic Computing, the NVIDIA Ampere architecture addresses all of these challenges, delivering unmatched acceleration at any scale.

NVIDIA GPU and converged accelerators are purpose-built for large-scale deployment, bringing networking, security, and small footprint to the cloud, data center, and edge.

BREAKTHROUGH INNOVATIONS

The NVIDIA Ampere Architecture includes 54 billion transistors and is the largest 7-nanometer (nm) chip
ever built. It also features six breakthrough innovations.

THIRD GENERATION TENSOR COMPUTING UNITS

First introduced on the NVIDIA Volta™ architecture, NVIDIA Tensor compute unit technology has dramatically accelerated AI, reducing training times from several weeks to a few hours and massively speeding up inference. The NVIDIA Ampere architecture builds on these innovations and enables new levels of precision - Tensor Float 32 (TF32) and Floating Point 64 (FP64) - to accelerate and simplify AI adoption and extend the performance of Tensor compute units to HPC.

TF32 works just like FP32, delivering up to 20x more performance for AI without changing code. With NVIDIA Automatic Mixed Precision, researchers can additionally double their performance thanks to automatic mixed precision and FP16, with only a few lines of code to add. With support for bfloat16, INT8, and INT4, Tensor compute units from NVIDIA Tensor Core GPUs with Ampere architecture become an incredibly versatile accelerator for AI training and inference. With the power of Tensor compute units for HPC, A100 and A30 GPUs also enable matrix operations in full IEEE-certified FP64 precision.

LEARN MORE ABOUT TENSOR CALCULATION UNITS

MULTI-INSTANCE GRAPHICS PROCESSOR (MIG)

Every AI and HPC application can benefit from acceleration, but not every application needs the power of an entire GPU. Multi-instance GPU (MIG) is a feature that enables multiple workloads to share the GPU on A100 and A30 GPUs. With MIG, each GPU can be split into multiple GPU instances that run securely and in complete isolation at the hardware level, with their own high-bandwidth memory, cache and compute units. Developers can now access breakthrough acceleration for all applications, large and small, and receive guaranteed quality of service. IT administrators can also provide appropriate GPU acceleration for optimal utilization and extend access to all users and applications in both bare-metal and virtualized environments.

LEARN MORE ABOUT MIG

THIRD GENERATION NVLINK

Scaling applications across multiple GPUs requires extremely fast data movement. The third generation of NVIDIA® NVLink® in the NVIDIA Ampere architecture doubles the direct bandwidth between GPUs to 600 gigabytes per second (GB/s). This is nearly 10 times higher than PCIe Gen4. Combined with the latest generation NVIDIA NVSwitch™, all GPUs on the server can communicate with each other at full NVLink speed and transfer data extremely fast.

NVIDIA DGX™ A100 and servers from other leading computer manufacturers leverage NVLink and NVSwitch technologies over NVIDIA HGX™ A100 baseboards for higher scalability in HPC and AI workloads.

LEARN MORE ABOUT TENSOR CALCULATION UNITS

STRUCTURAL LOW DENSITY

Modern AI networks are large and getting larger - with millions and in some cases billions of parameters. Not all of these parameters are required for accurate prediction and inference, and some can be converted to zeros, giving models "low density" without sacrificing accuracy. Tensor computational units can achieve up to 2 times better performance with sparse models. Even though the sparse feature is for AI inference, it can also be used to improve model training performance.

LEARN MORE

SECOND GENERATION RT COMPUTING UNITS

The second generation RT computing units of the NVIDIA Ampere architecture in NVIDIA A40- and A10-GPUs dramatically increase the speed of workloads such as photorealistic rendering of movie content, architectural design assessment, and virtual prototyping of product designs. RT compute units also accelerate motion blur rendering with ray tracing, for faster results with greater visual accuracy, and can simultaneously perform ray tracing with shading or denoising.

LEARN MORE ABOUT RAYTRACING

SMARTER, FASTER MEMORY

A100 adds massive computing capacity to data centers. To ensure full utilization of this compute capacity, it has a sensational 2 terabytes per second (TB/s) memory bandwidth, double that of the previous generation. In addition, the A100 has significantly larger on-board memory, including a 40 megabyte (MB) Level 2 cache - seven times that of the previous generation - to maximize computing performance.

NVIDIA Data Center GPUs enable researchers to deliver real-world results and deploy solutions into production at scale

NVIDIA® professional GPUs enable everything from stunning industrial design to advanced special effects to complex scientific visualization, and are recognized as the world's leading platform for visual computing. Millions of creative and technical professionals rely on NVIDIA's professional graphics processors to accelerate their workflows. Only they have the most advanced ecosystem of hardware, software, tools, and ISV support to turn today's challenges into tomorrow's success stories.

nVIDIA-certified systems with NVIDIA A2, A30, and A100 Tensor Core GPUs and NVIDIA AI-including NVIDIA Triton Inference Server, an open-source software for inference services-provide breakthrough inference performance for edge, data center, and cloud. They ensure that AI-enabled applications can be deployed with fewer servers and less power consumption, resulting in simpler deployments and faster insights at significantly lower cost.

NVIDIA A2

The NVIDIA A2 Tensor Core GPU delivers entry-level inference with low power, small footprint, and high performance for intelligent video analytics (IVA) or NVIDIA AI at the Edge. With a low-profile PCIe Gen4 card and a low configurable Thermal Design Power (TDP) of 40-60 watts, the A2 brings versatile inferencing acceleration to any server.

The A2's versatility, compact size, and low power consumption exceed the requirements for large-scale edge deployments, enabling existing entry-level CPU servers to be immediately upgraded for inferencing. Servers accelerated with A2 GPUs deliver up to 20x higher inference performance compared to CPUs and 1.3x more efficient IVA implementations than previous GPU generations - all at an entry-level price.

NVIDIA A30

Bring accelerated performance to all enterprise workloads with NVIDIA A30 Tensor Core GPUs. With NVIDIA Ampere architecture Tensor cores and multi-instance GPUs (MIG), it delivers assured speedups for various workloads, including large-scale AI inference and high-performance computing (HPC) applications. By combining fast memory bandwidth and low power consumption in a PCIe form factor optimized for mainstream servers, the A30 enables an elastic data center and delivers maximum value to enterprises.

The NVIDIA A30 Tensor Core GPU provides a versatile platform for mainstream enterprise workloads such as AI inference, training, and HPC. With TF32 and FP64 tensor core support and an end-to-end software and hardware solution stack, the A30 ensures that mainstream AI training and HPC applications can be handled quickly. Multi-instance GPU (MIG) ensures quality of service (QoS) with secure, hardware-partitioned GPUs right-sized for all of these workloads for different users and makes optimal use of GPU compute resources.

NVIDIA A16

Unprecedented Usability and Density for Graphics-Intensive VDIMreach
a new dimension of remote work with NVIDIA A16, the ideal GPU for VDI with high user density and graphics performance. Based on the latest NVIDIA Ampere architecture, the A16 is specifically designed for the highest user density, with up to 64 concurrent users per board in a dual-slot form factor. Combined with NVIDIA Virtual PC (vPC) software, it provides the power needed to tackle any project from anywhere. Based on the NVIDIA Ampere architecture, the A16 offers twice the user density of the previous generation while ensuring an optimal user experience.

NVIDIA A100 - 80GB

highest-performance elastic data centers for AI, data analytics, and high-performance computing (HPC) applications. As the engine of NVIDIA's data center platform, A100 delivers up to 20x higher performance than the previous generation NVIDIA Volta. A100 can be efficiently scaled or partitioned into seven isolated GPU instances, with multi-instance GPU (MIG) providing a unified platform that enables elastic data centers to dynamically adapt to changing workload requirements.

A100 is part of NVIDIA's complete data center solution, which includes building blocks for hardware, networking, software, libraries, and optimized AI models and applications from NGC. As the most powerful end-to-end AI and HPC data center platform, it enables researchers to deliver real-world results and deploy solutions at scale in production, while IT can optimize the use of every available A100 GPU.

CONVERGED ACCELERATION AT THE PERIPHERY

The combination of NVIDIA Ampere architecture with NVIDIA BlueField®-2 data processing units (DPU) in NVIDIA converged acceler ators provides unprecedented compute and network acceleration to process the massive amounts of data generated in the data center and at the edge. BlueField-2 combines the power of NVIDIA ConnectX®-6 Dx with programmable arm cores and hardware offloads for software-defined storage, networking, security, and management workloads. NVIDIA converged accelerators enable customers to run data-intensive edge and data center workloads with maximum security and performance.

SECURE IMPLEMENTATION

Secure deployments are critical to enterprise business operations, and the NVIDIA Ampere architecture provides optional secure boot through trusted code authentication and proven rollback protection to guard against malicious malware attacks, prevent operational loss, and accelerate workloads.

INSIGHT INTO THE NVIDIA AMPERE ARCHITECTURE

Discover the latest architecture technologies and the entire associated GPU range.