THE HEART OF KI AND HPC IN THE MODERN DATA CENTER
NVIDIA GPU and converged accelerators are purpose-built for large-scale deployment, bringing networking, security, and small footprint to the cloud, data center, and edge.
BREAKTHROUGH INNOVATIONS
ever built. It also features six breakthrough innovations.
THIRD GENERATION TENSOR COMPUTING UNITS
First introduced on the NVIDIA Volta™ architecture, NVIDIA Tensor compute unit technology has dramatically accelerated AI, reducing training times from several weeks to a few hours and massively speeding up inference. The NVIDIA Ampere architecture builds on these innovations and enables new levels of precision - Tensor Float 32 (TF32) and Floating Point 64 (FP64) - to accelerate and simplify AI adoption and extend the performance of Tensor compute units to HPC.
TF32 works just like FP32, delivering up to 20x more performance for AI without changing code. With NVIDIA Automatic Mixed Precision, researchers can additionally double their performance thanks to automatic mixed precision and FP16, with only a few lines of code to add. With support for bfloat16, INT8, and INT4, Tensor compute units from NVIDIA Tensor Core GPUs with Ampere architecture become an incredibly versatile accelerator for AI training and inference. With the power of Tensor compute units for HPC, A100 and A30 GPUs also enable matrix operations in full IEEE-certified FP64 precision.

MULTI-INSTANCE GRAPHICS PROCESSOR (MIG)
Every AI and HPC application can benefit from acceleration, but not every application needs the power of an entire GPU. Multi-instance GPU (MIG) is a feature that enables multiple workloads to share the GPU on A100 and A30 GPUs. With MIG, each GPU can be split into multiple GPU instances that run securely and in complete isolation at the hardware level, with their own high-bandwidth memory, cache and compute units. Developers can now access breakthrough acceleration for all applications, large and small, and receive guaranteed quality of service. IT administrators can also provide appropriate GPU acceleration for optimal utilization and extend access to all users and applications in both bare-metal and virtualized environments.

THIRD GENERATION NVLINK
Scaling applications across multiple GPUs requires extremely fast data movement. The third generation of NVIDIA® NVLink® in the NVIDIA Ampere architecture doubles the direct bandwidth between GPUs to 600 gigabytes per second (GB/s). This is nearly 10 times higher than PCIe Gen4. Combined with the latest generation NVIDIA NVSwitch™, all GPUs on the server can communicate with each other at full NVLink speed and transfer data extremely fast.
NVIDIA DGX™ A100 and servers from other leading computer manufacturers leverage NVLink and NVSwitch technologies over NVIDIA HGX™ A100 baseboards for higher scalability in HPC and AI workloads.

STRUCTURAL LOW DENSITY
Modern AI networks are large and getting larger - with millions and in some cases billions of parameters. Not all of these parameters are required for accurate prediction and inference, and some can be converted to zeros, giving models "low density" without sacrificing accuracy. Tensor computational units can achieve up to 2 times better performance with sparse models. Even though the sparse feature is for AI inference, it can also be used to improve model training performance.

SECOND GENERATION RT COMPUTING UNITS
The second generation RT computing units of the NVIDIA Ampere architecture in NVIDIA A40- and A10-GPUs dramatically increase the speed of workloads such as photorealistic rendering of movie content, architectural design assessment, and virtual prototyping of product designs. RT compute units also accelerate motion blur rendering with ray tracing, for faster results with greater visual accuracy, and can simultaneously perform ray tracing with shading or denoising.

SMARTER, FASTER MEMORY
A100 adds massive computing capacity to data centers. To ensure full utilization of this compute capacity, it has a sensational 2 terabytes per second (TB/s) memory bandwidth, double that of the previous generation. In addition, the A100 has significantly larger on-board memory, including a 40 megabyte (MB) Level 2 cache - seven times that of the previous generation - to maximize computing performance.

NVIDIA Data Center GPUs enable researchers to deliver real-world results and deploy solutions into production at scale
NVIDIA Data Center GPUs enable researchers to deliver real-world results and deploy solutions into production at scale
nVIDIA-certified systems with NVIDIA A2, A30, and A100 Tensor Core GPUs and NVIDIA AI-including NVIDIA Triton Inference Server, an open-source software for inference services-provide breakthrough inference performance for edge, data center, and cloud. They ensure that AI-enabled applications can be deployed with fewer servers and less power consumption, resulting in simpler deployments and faster insights at significantly lower cost.

NVIDIA A2
The A2's versatility, compact size, and low power consumption exceed the requirements for large-scale edge deployments, enabling existing entry-level CPU servers to be immediately upgraded for inferencing. Servers accelerated with A2 GPUs deliver up to 20x higher inference performance compared to CPUs and 1.3x more efficient IVA implementations than previous GPU generations - all at an entry-level price.

NVIDIA A30
The NVIDIA A30 Tensor Core GPU provides a versatile platform for mainstream enterprise workloads such as AI inference, training, and HPC. With TF32 and FP64 tensor core support and an end-to-end software and hardware solution stack, the A30 ensures that mainstream AI training and HPC applications can be handled quickly. Multi-instance GPU (MIG) ensures quality of service (QoS) with secure, hardware-partitioned GPUs right-sized for all of these workloads for different users and makes optimal use of GPU compute resources.

NVIDIA A16
a new dimension of remote work with NVIDIA A16, the ideal GPU for VDI with high user density and graphics performance. Based on the latest NVIDIA Ampere architecture, the A16 is specifically designed for the highest user density, with up to 64 concurrent users per board in a dual-slot form factor. Combined with NVIDIA Virtual PC (vPC) software, it provides the power needed to tackle any project from anywhere. Based on the NVIDIA Ampere architecture, the A16 offers twice the user density of the previous generation while ensuring an optimal user experience.

NVIDIA A100 - 80GB
A100 is part of NVIDIA's complete data center solution, which includes building blocks for hardware, networking, software, libraries, and optimized AI models and applications from NGC. As the most powerful end-to-end AI and HPC data center platform, it enables researchers to deliver real-world results and deploy solutions at scale in production, while IT can optimize the use of every available A100 GPU.
CONVERGED ACCELERATION AT THE PERIPHERY
The combination of NVIDIA Ampere architecture with NVIDIA BlueField®-2 data processing units (DPU) in NVIDIA converged acceler ators provides unprecedented compute and network acceleration to process the massive amounts of data generated in the data center and at the edge. BlueField-2 combines the power of NVIDIA ConnectX®-6 Dx with programmable arm cores and hardware offloads for software-defined storage, networking, security, and management workloads. NVIDIA converged accelerators enable customers to run data-intensive edge and data center workloads with maximum security and performance.

SECURE IMPLEMENTATION
Secure deployments are critical to enterprise business operations, and the NVIDIA Ampere architecture provides optional secure boot through trusted code authentication and proven rollback protection to guard against malicious malware attacks, prevent operational loss, and accelerate workloads.
