Faster insights can save time, money and even lives. That's why companies in every industry want to harness the data generated by billions of IoT sensors to transform themselves. To do this, they need powerful, distributed computing power and secure, easy management. NVIDIA Edge Computing solutions bring together NVIDIA Certified Systems™, embedded platforms, AI software, and turnkey management services designed for AI at the network edge.

UNITING NETWORKING AND COMPUTING POWER

NVIDIA converged accelerators combine the powerful performance of the NVIDIA Ampere architecture with the enhanced security and latency reduction capabilities of the NVIDIA® BlueField®-2 data processing unit (DPU). With converged accelerators, enterprises can create faster, more efficient, and more secure AI systems in data centers and in the periphery.

UNPRECEDENTED GPU PERFORMANCE

For a wide range of compute-intensive workloads, the NVIDIA Ampere architecture delivers the largest generational leap ever to further secure and accelerate enterprise and peripheral infrastructure.

IMPROVED SAFETY

The NVIDIA BlueField-2 DPU delivers innovative acceleration, security, and efficiency for any host. BlueField-2 combines the performance of NVIDIA ConnectX®-6 Dx with Arm® programmable cores and hardware offloads for software-defined storage, networking, security, and management workloads.

FASTER DATA SPEEDS

NVIDIA converged accelerators offer an integrated PCIe Gen4 switch. This allows data to be transferred between the GPU and DPU without going through the server PCIe system. Even in systems with PCIe Gen3 on the host, communication occurs at full PCIe Gen4 speed. This enables new levels of data center efficiency and security for GPU-accelerated workloads, including AI-based security, 5G telecom, and other edge applications.

COMPLETE INFERENCE PORTFOLIO

NVIDIA offers a complete portfolio of NVIDIA-certified systems with Ampere Tensor Core GPUs as the inference engine for NVIDIA AI. The introduction of the A2 Tensor Core GPUs adds an entry-level inference solution in a low-profile form factor to the NVIDIA AI portfolio, which already includes the A100 and A30 Tensor Core GPUs. With a low power consumption of up to 40W, A2 fits into any server, making it ideal for far-edge servers. The A100 provides the highest inference performance at any scale for compute-intensive applications, and the A30 provides optimal inference performance for mainstream servers. NVIDIA-certified systems with the NVIDIA A100, A30, and A2 tensor-core GPUs deliver leading inference performance in the cloud, data center, and edge, ensuring that AI-enabled applications can be deployed with fewer servers and less power consumption, resulting in faster insights at a significantly lower cost.

NVIDIA A100

The world's fastest graphics processor with the world's fastest memory
  • Fastest computing
  • FP64 precision
  • 40GB or 80GB HBM memory
  • 1.3x faster memory bandwidth and the world's first with >2TB/s
  • Up to 7MIG instances

NVIDIA A30

Versatile compute acceleration for mainstream enterprise servers
  • Mainstream computing power
  • FP64 precision
  • 24 GB HBM memory
  • NVLink
  • Up to 4 MIG instances

NVIDIA A2

Versatile entry-level GPU brings NVIDIA AI to any server
  • AI Inference, IVA, Edge
  • Fits any server: Low power (40-60W) and low profile
  • Amp ³rd Gen Tensor Cores, ²nd Gen RT Cores
  • Entry level price
NVIDIA's converged accelerators combine the performance of the NVIDIA Ampere architecture with the advanced security and networking features of the NVIDIA® BlueField®-2 data processing unit (DPU) in a single high-performance package. This advanced architecture delivers unprecedented performance for AI-powered workloads in edge computing, telecommunications, and network security.

The NVIDIA Ampere architecture delivers the biggest performance leap ever for a wide range of compute-intensive workloads, while BlueField-2 combines the power of NVIDIA ConnectX®-6 Dx with programmable Arm® cores and hardware offloads for software-defined storage, networking, security and management workloads. NVIDIA converged accelerators feature an integrated PCIe Gen4 switch that allows data to be transferred between the GPU and DPU without passing through the server. This enables a new level of data center efficiency and security for network-intensive, GPU-accelerated workloads.

TECHNICAL DETAILS

Feature
A100X
A30X
GPU Memory
80GB HBM2e
24 GB HBM2e
Memory Bandwidth
1800 GB/s
900 GB/s
MIG instances
7 instances @ 10GB each3
instances @ 20GB each2
instances @ 40GB each

900 GB/s
Interconnect
PCIe Gen4 (x16 Physical, x8 Electrical)
PCIe Gen4 (x16 Physical, x8 Electrical)
NVLINK Bridge
3x
1x
Form Factor
2 Slot FHFL
2 Slot FHFL
Max Power
300 W
230 W

FASTER 5G

NVIDIA Aerial™ is designed to build high-performance, software-defined, cloud-native 5G applications to meet growing consumer demand. It enables GPU-accelerated signal and data processing for 5G wireless radio area networks (RANs). NVIDIA converged accelerators are the most powerful platform for running 5G applications. Since data does not have to traverse the host PCIe system, processing latency is significantly reduced. The resulting higher throughput enables higher subscriber density per server.
5G Infrastructure Image

KI-BASED CYBERSECURITY

Converged accelerators open new possibilities for AI-based cybersecurity and networking. The DPU's arm cores can be programmed with the NVIDIA Morpheus application framework to perform GPU-accelerated advanced networking functions such as threat detection, data leakage prevention, and anomalous behavior detection. GPU processing can be applied directly to network traffic at a high data rate. Data is transmitted on a direct path between the GPU and DPU, which provides better isolation.
Cybersecurity Image

ACCELERATE AI-ON-5G IN THE EDGE

NVIDIA AI-on-5G consists of the NVIDIA EGX™ platform, the NVIDIA Aerial™ SDK for software-defined 5G Virtual Radio Area Networks (vRANs), and enterprise AI frameworks, including SDKs such as NVIDIA Isaac™ and NVIDIA Metropolis™. This platform enables edge devices such as video cameras and industrial sensors, as well as robots, to leverage AI and communicate with the data center over 5G. Converged cards enable all of these capabilities to be deployed in a single enterprise server without having to deploy more costly purpose-built systems. The same converged card used to accelerate 5G signal processing can also be used for edge AI, with multi-instance GPU technology (MIG) from NVIDIA allowing the GPU to be partitioned for multiple different applications.
5G In Edge Image

BALANCED, OPTIMIZED DESIGN

Integrating the GPU, DPU, and PCIe switch into a single device inherently creates a balanced architecture. In systems where multiple GPUs and DPUs are desired, a converged accelerator card avoids contention on the server's PCIe system, so performance scales linearly with additional devices. Converged cards also make performance much more predictable. Offloading these components to a physical card also improves space requirements and energy efficiency. Converged cards greatly simplify deployment and ongoing maintenance, especially when installed in mass-oriented servers at scale.
Nvidia Graphic Card Image
BUILT-IN BLUEFIELD-2 DPU
​​​​​​​

● 100GbE, dual port QSFP56, PCIe 4.0x8, Ethernet and InfiniBand, PAM4/NRZ, ConnectX-6 Dx interior
● 8 ARM A72 CPUs subsystem - over 2.0GHz
● 8 MB L2 cache, 6 MB L3 cache in 4 tiles, fully coherent low latency interconnect
● Integrated PCIe switch, 16x Gen4.0, PCIe Root Complex or End Point modes
● Single DDR4 channelDeveloper ecosystemNVIDIA

Converged Accelerators extend the capabilities of CUDA® and NVIDIA DOCA™ programming libraries for workload acceleration and offloading.

CUDA applications can run on the x86 host or on the DPU's arm processor for isolated AI and inferencing applications.

DISCOVER NVIDIA'S CONVERGED ACCELERATORS

This device enables data-intensive workloads to run at the edge and
in the data center with maximum security and performance.
Nvidia Graphic Card Architecture Image

A30X

A30X combines the NVIDIA A30 Tensor core GPU with the BlueField 2 GPU. With MIG, the GPU can be partitioned into up to four GPU instances, each running a separate service. The design of this board provides a good balance between compute and I/O performance for use cases such as 5G vRAN and AI-based cybersecurity. Multiple services can run on the GPU and benefit from the low latency and predictable performance of the integrated PCIe switch.

A100X

The A100X combines the power of the NVIDIA A100 tensor core GPU with the BlueField 2 GPU. With MIG, each A100 can be partitioned into up to seven GPU instances, allowing even more services to run simultaneously. A100X is ideal for use cases with more intensive compute requirements. Examples include 5G with extensive multiple-input and multiple-output (MIMO) capabilities, AI-on-5G deployments, and specialized workloads such as signal processing and multi-node training.
The NVIDIA A2 Tensor Core GPU provides entry-level inferencing with low power, small footprint, and high performance for intelligent video analytics (IVA) with NVIDIA AI at the Edge. With a low-profile PCIe Gen4 card and a low configurable Thermal Design Power (TDP) of 40-60 watts (W), the A2 brings versatile inference acceleration to any server.

The A2's versatility, compact size, and low power consumption exceed the requirements for large-scale edge deployments, enabling existing entry-level CPU servers to be immediately upgraded for inferencing. Servers accelerated with A2 GPUs offer up to 20x higher inference performance compared to CPUs and 1.3x more efficient IVA deployments than previous GPU generations - all at an entry-level price.
NVIDIA Certified Systems™ with NVIDIA A2, A30, and A100 tensor-core GPUs and NVIDIA AI-including NVIDIA Triton™ Inference Server, open-source inference processing software-provide breakthrough inference performance for the edge, data center, and cloud. They ensure that AI-enabled applications use fewer servers and less power, resulting in simpler deployments and faster insights at significantly lower cost.

UP TO 7 TIMES MORE INFERENCE POWER

AI inference is being used to improve the lives of consumers through real-time intelligent experiences and gain insights through trillions of endpoint sensors and cameras. Compared to CPU-only servers, edge and entry-level servers with NVIDIA A2 Tensor core GPUs offer up to 20x more inference performance and instantly empower any server to handle modern AI.
Nvidia A2 Performance Comparison Image
System configuration: CPU: HPE DL380 Gen10 Plus, 2S Xeon Gold 6330N @2.2 GHz, 512 GB DDR4
NLP: BERT-Large (sequence length: 384, SQuAD: v1.1) | TensorRT 8.2, precision: INT8, BS:1 (GPU) | OpenVINO 2021.4, precision: INT8, BS:1 (CPU)
Text-to-Speech: Tacotron2 + Waveglow end-to-end pipeline (input length: 128) | PyTorch 1.9, precision: FP16, BS:1 (GPU) | PyTorch 1.9, Precision: FP32, BS:1 (CPU)
Computer Vision: EfficientDet-D0 (COCO, 512x512) | TensorRT 8.2, Precision: INT8, BS:8 (GPU) | OpenVINO 2021.4, Precision: INT8, BS:8 (CPU)


HIGHER IVA PERFORMANCE FOR THE SMART EDGE

Servers equipped with NVIDIA A2 GPUs deliver up to 1.3 times better performance in smart edge use cases, including smart cities, manufacturing, and retail. NVIDIA A2 GPUs running IVA workloads deliver more efficient deployments with up to 1.6 times better price/performance and 10 percent better power efficiency than previous GPU generations.

IVA POWER (NORMALIZED)

IVA Performance Image
System configuration: [Supermicro SYS-1029GQ-TRT, 2S Xeon Gold 6240 @2.6 GHz, 512 GB DDR4, 1x NVIDIA A2 OR 1x NVIDIA T4] | Measured performance with Deepstream 5.1. Networks: ShuffleNet-v2 (224 x 224), MobileNet-v2 (224 x 224). | Pipeline provides end-to-end performance in video capture and decoding, pre-processing, batch processing, inference, and post-processing.

OPTIMIZED FOR ANY SERVER

The NVIDIA A2 GPU is optimized for inference workloads and deployments in entry-level servers constrained by space and thermal requirements such as 5G edge and industrial environments. A2 offers a compact form factor that operates at low power consumption, with a TDP as low as 60W and as high as 40W, making A2 ideal for any server.

LOWER POWER CONSUMPTION AND CONFIGURABLE TDP

TDP Operating Range Image

TECHNICAL DETAILS OF A2

Peak FP32
4.5 TF
TF32 Tensor Core
9 TF | 18 TF¹
BFLOAT16 Tensor Core
18 TF | 36 TF¹
Peak INT8 Tensor Core
36 TOPS | 72 TOPS¹
Peak INT4 Tensor Core
72 TOPS | 144 TOPS¹
RT Cores
10
Media engines
1 video encoder2
video decoders (includes AV1 decode)



GPU memory
16GB GDDR6
GPU memory bandwidth
200GB/s
Interconnect
200GB/s
Form factor
1-slot, low-profile PCIe
Max thermal design power (TDP)
40-60W (configurable)
Virtual GPU (vGPU) software support²
NVIDIA Virtual PC (vPC), NVIDIA Virtual Applications (vApps), NVIDIA RTX Virtual Workstation (vWS), NVIDIA AI Enterprise, NVIDIA Virtual Compute Server (vCS)

CONTACT US

Don't miss the release day of the A100X and A30X GPUs.