ACCELERATED COMPUTING PERFORMANCE, VIRTUALIZED

AI, deep learning, and data science require unprecedented computing power. NVIDIA's Virtual Compute Server (vCS) enables data centers to accelerate server virtualization with the latest NVIDIA data center GPUs, such as the NVIDIA A100 and A30 Tensor Core GPUs, so that even the most compute-intensive workloads, such as.such as artificial intelligence, deep learning, and data science, can run on a virtual machine (VM) with NVIDIA vGPU technology. This is not a small step for virtualization, but a big leap.


View document: vCS Solution Overview (PDF 280 KB)

Features

GPU sharing (fractional) is only possible with NVIDIA vGPU technology. It allows multiple VMs to share a GPU and maximize utilization for lighter workloads that require GPU acceleration.

With GPU aggregation, a VM can access multiple GPUs, which is often required for compute-intensive workloads. vCS supports both multi-vGPU and peer-to-peer computing. In multi-vGPU, the GPUs are not directly connected. In peer-to-peer, they are connected through NVLink to achieve higher bandwidth.

vCS supports monitoring at the app, guest, and host levels. In addition, proactive management capabilities provide the ability to perform live migration, pause and resume and create thresholds. These thresholds show consumption trends that impact the user experience. All of this is possible in the vGPU Management SDK.

NVIDIA GPU Cloud (NGC) is a hub for GPU-optimized software that simplifies workflows for deep learning, machine learning, and HPC, and now supports virtualized environments with NVIDIA vCS.

MORE INFORMATION

NVIDIA® NVLink™ is a fast, direct GPU-to-GPU connection that provides higher bandwidth, more connections, and improved scalability for system configurations with multiple GPUs - now virtually supported with NVIDIA virtual GPU (vGPU) technology.

MORE INFORMATION

Error Correction Code (ECC) and Page Retirement provide increased reliability for computing applications that are susceptible to data corruption. They are especially important in large cluster computing environments where GPUs process very large data sets and/or run applications for long periods of time.

MORE INFORMATION

Multi-instance GPUs (MIG) represent a revolutionary technology that can extend the capabilities of the data center, allowing each NVIDIA A100 Tensor Core GPU to be split into up to seven fully isolated instances and backed up at the hardware level with dedicated memory, cache, and high-bandwidth compute cores. With vCS software, a VM can run on each of these MIG instances, allowing organizations to take advantage of the management, monitoring and operational benefits of hypervisor-based server virtualization.

GPUDirect® Remote Direct Memory Access (RDMA) allows network devices to access GPU memory directly, bypassing CPU host memory, reducing GPU communication latency, and fully offloading the CPU.

MORE INFORMATION

SCALED FOR MAXIMUM EFFICIENCY

NVIDIA Virtual GPUs give you near bare-metal performance in a virtualized environment: maximum utilization, management, and monitoring in a hypervisor-based virtualization environment for GPU-accelerated AI.

Performance scaling for deep learning training with vCS on NVIDIA A100 tensor core GPUs

Developers, data scientists, researchers, and students need massive computing power for deep learning training. Our A100 Tensor Core GPU accelerates the work, allowing more to be accomplished faster. NVIDIA software, Virtual Compute Server, provides nearly the same performance as bare metal, even when scaled to large deep learning training models that use multiple GPUs.
NVIDIA A100 tensor core GPU performance scaling
deep learning inference throughput performance with MIG on NVIDIA A100 tensor core GPU

Throughput performance for deep learning inference with MIG on NVIDIA A100 tensor core GPUs with vCS

Multi-Instance GPU (MIG) is a technology found only on the NVIDIA A100 tensor core GPU that partitions the A100 GPU into up to seven instances, each fully isolated with its own high-bandwidth memory and cache and cores. MIG can be used with Virtual Compute Server, providing one VM per MIG instance, and performance is consistent whether running an inference workload across multiple MIG instances on bare metal or virtualized with vCS.

RESOURCES FOR IT MANAGERS

Learn how NVIDIA Virtual Compute Server maximizes performance and simplifies IT management.

Utilization optimization

Take full advantage of valuable GPU resources by seamlessly splitting GPUs for simpler workloads like inference or deploying multiple virtual GPUs for more compute-intensive workloads like deep learning training.

Manageability and monitoring

Ensure the availability and readiness of the systems that data scientists and researchers need. Monitor GPU performance at the guest, host, and application levels. You can even leverage management tools like suspend/resume and live migration. Learn more about the operational benefits of GPU virtualization.

GRAPHICS PROCESSOR RECOMMENDATIONS for computing networks

NVIDIA H100 PCIe
NVIDIA H100 SXM
NVIDIA A100 PCIe
NVIDIA A100 SXM
Working memory
80 GB
80 GB
80 GB HBM2e
80 GB HBM2e
Working memory bandwidth
2 TB/s
3 TB/s
1.935 GB/s
2.039 GB/s
FP 32
48 TFLOPS
60 TFLOPS
19.5 TFLOPS
19.5 TFLOPS
Tensor Float 32 (TF32)
800 TFLOPS*
1,000 TFLOPS*
156 TFLOPS
312 TFLOPS*
FP64
24 TFLOPS
30 TFLOPS
9.7 TFLOPS
9.7 TFLOPS
FP64 Tensor Core
48 TFLOPS
60 TFLOPS
19.5 TFLOPS
19.5 TFLOPS
BFLOAT16 Tensor Core
1,600 TFLOPS*
2,000 TFLOPS*
312 TFLOPS
624 TFLOPS*
FP16 Tensor Core
1,600 TFLOPS*
2,000 TFLOPS*
312 TFLOPS
624 TFLOPS*
INT8 Tensor Core
4,000 TFLOPS*
3,200 TFLOPS*
624 TOPS
1248 TOPS*
Form Factor
PCIe dual-slotair-cooled
SXM
PCIe dual-slot
air cooling or single-slot
liquid cooling
SXM
Interconnect
NVLink:600GB/s
PCIeGen5: 128GB/s

NVLink:900GB/s
​​​​​​​PCIeGen5:128GB/s

NVIDIA® NVLink® Bridge for 2 GPUs: 600GB/s **
PCIe Gen4: 64GB/s
nVLink: 600GB/sPCIe
Gen4: 64GB/s
Server Options
Partner andNVIDIA-Certified Systems with1-8 GPUs
NVIDIA HGX™H100 partner andNVIDIA-CertifiedSystems™ with 4 or 8 GPUs
NVIDIA DGX™H100 with 8 GPUs
Partner and NVIDIA Certified Systems™ with 1-8 GPUs
nVIDIA HGX™ A100- Partner and
NVIDIA Certified Systems with 4,8, or 16 GPUs
NVIDIA DGX™ A100 with 8 GPUs
Max Thermal Design Power (TDP)
350 W
700 W
300 W
400 W***
MIG Support
Up to 7 MIGs @ 10GB
Up to 7 MIGs @ 10GB
Up to 7 MIGs @ 10GB
up to 7 MIGs @ 10GB
NVIDIA A16
NVIDIA A40
NVIDIA A30
NVIDIA A10
NVIDIA A2
Working memory
4x 16 GB GDDR6
48 GB GDDR6 with ECC
24GB HBM2
24 GB GDDR6
16GB GDDR6
Working memory bandwidth
4x 200 GB/s
696 GB/s
933 GB/s
600 GB/s
200GB/s
FP 32
18 (4x 4.5) TFLOPS
37.4 TFLOPS
10.3 TFLOPS
31.2 TFLOPS
4.5 TFLOPS
Tensor Float 32 (TF32)
36 (4x 9) TFLOPS
149.6 TFLOPS
165 TFLOPS
125 TFLOPS
18 TFLOPS*
FP64
4x 135.6 (542.4) GFLOPS
584.6 GFLOPS
5.2 TFLOPS
976.3 GFLOPS
70.80 GFLOPS
FP64 Tensor Core
-
-
10.3 TFLOPS
10.3 TFLOPS
-
BFLOAT16 Tensor Core
-
299.4 TFLOPS
330 TFLOPS
250 TFLOPS
36 TFLOPS*
FP16 Tensor Core
71.6 (4x 17.9) TFLOPS
299.4 TFLOPS
330 TFLOPS
250 TFLOPS
250 TFLOPS
INT8 Tensor Core
598.6
598.6
661TOPS*
661TOPS*
72TOPS*
Form Factor
full height, full length (FHFL) Dual Slot
4.4" (H) x 10.5" (L) dual slot
2-slot, full height, full length (FHFL)
1-slot FHFL
1-slot, low-profile PCIe
Interconnect
-
nVIDIA® NVLink® 112.5 GB/s
(bidirectional)3 PCIe Gen4: 64GB/s
pCIe Gen4: 64GB/s Third-gen NVIDIA® NVLINK® 200GB/s
PCIe Gen4: 64 GB/s
PCIe Gen4 x8
Max Thermal Design Power (TDP)
250 W
300 W
165 W
150 W
40-60W (Configurable)
MIG Support
-
-
4 MIGs @ 6GB each
2 MIGs @ 12GB each
1 MIGs @ 24GB
-
-
vGPU software support
NVIDIA Virtual PC (vPC),
​​​​​​​NVIDIA Virtual Applications (vApps),
NVIDIA RTX Virtual Workstation (vWS),
NVIDIA AI Enterprise,
NVIDIA Virtual Compute Server (vCS)
-
NVIDIA AI Enterprise NVIDIA Virtual Compute Server
NVIDIA vPC/vApps, NVIDIA RTX™ vWS, NVIDIA AI Enterprise
NVIDIA Virtual PC (vPC),
NVIDIA Virtual Applications (vApps),
NVIDIA RTX Virtual Workstation (vWS),
​​​​​​​NVIDIA AI Enterprise, NVIDIA Virtual Compute Server (vCS)
* With economy.
** SXM4 GPUs via HGX A100 server cards; PCIe GPUs via NVLink Bridge for up to two GPUs.
*** 400W TDP for standard configuration. HGX A100-80GB CTS (Custom Thermal Solution) SKU can support TDPs up to 500W.

PARTNER IN VIRTUALIZATION