A leap of scale for accelerated computing
Transformational AI Training
Transformational AI Training
NVIDIA H100 GPUs feature fourth-generation Tensor compute units and the Transformer Engine with FP8 precision, which provides up to 9 times faster training compared to the previous generation for Mixture of Experts (MoE) models. The combination of fourth-generation NVlink, which provides 900 gigabytes per second (GB/s) GPU-to-GPU connectivity; NVSwitch, which accelerates collective communication through each GPU across nodes; 5th-generation PCIe; and NVIDIA Magnum IO™ software provides efficient scalability from small enterprises to massive, unified GPU clusters.
Deploying H100 GPUs at data center scale delivers unprecedented performance and the next generation of exascale high-performance computing (HPC) and trillion-parameter AI for all researchers.
Real-Time Deep Learning Inference
Real-Time Deep Learning Inference
AI solves a wide range of business challenges with an equally wide range of neural networks. A superior AI inference accelerator must provide not only the highest performance, but also the versatility to accelerate these networks.
H100 further extends NVIDIA's market-leading position in inference with several advances that accelerate inference by up to 30x and provide the lowest latency. Fourth-generation Tensor compute units accelerate all precisions, including FP64, TF32, FP32, FP16, as well as INT8, and the Transformer Engine uses FP8 and FP16 together to reduce memory usage and increase performance while maintaining accuracy for large language models.
Exascale High-Performance Computing
Exascale High-Performance Computing
The NVIDIA data center platform consistently delivers performance gains that go beyond Moore's Law. H100's new breakthrough AI capabilities further amplify the power of HPC + AI to accelerate time to discovery for scientists and researchers working to solve the world's most important challenges.
H100 triples the floating-point operations per second (FLOPS) of Tensor Cores with twice the accuracy, delivering 60 teraFLOPS of FP64 computing for HPC. AI-powered HPC applications can leverage H100's TF32 precision to achieve petaFLOPS throughput for single-precision matrix multiplication operations, without code changes.
H100 also features DPX instructions, which provide 7x performance over NVIDIA A100 Tensor Core GPUs for dynamic programming algorithms such as Smith-Waterman for DNA sequence alignment, and 40x acceleration over traditional servers with dual-socket CPUs alone.
Data Analytics
Data Analytics
Data analytics often take up the majority of time in AI application development. As large data sets are distributed across multiple servers, scale-out solutions with CPU-only standard servers are slowed down by a lack of scalable computing power.
Accelerated servers with H100 deliver the compute power-along with 3 terabytes per second (TB/s) of memory bandwidth per GPU and scalability with NVLink and NVSwitch-to handle data analytics with high performance and scale to support massive data sets. Combined with NVIDIA Quantum-2 Infiniband, Magnum IO software, GPU-accelerated Spark 3.0, and NVIDIA RAPIDS™, the NVIDIA data center platform is capable of accelerating these massive workloads with unmatched levels of performance and efficiency.
Enterprise capacity utilization
Enterprise capacity utilization
IT managers seek to maximize the utilization (both peak and average) of computing resources in the data center. They often use dynamic reconfiguration of compute power to right-size resources for the workloads they are using.
The second generation of MIG in H100 maximizes the utilization of each GPU by securely partitioning them into up to seven separate instances. With confidential computing support, H100 enables secure end-to-end multi-tenant use, ideal for cloud service provider (CSP) environments.
Thanks to H100 with MIG, infrastructure managers can standardize their GPU-accelerated infrastructure while ensuring the flexibility to provision GPU resources with greater granularity to securely provide developers with the right amount of accelerated computing power and optimize the use of all their GPU resources.
MORE INFORMATION ON MIG
NVIDIA Confidential Computing and Security
NVIDIA Confidential Computing and Security
Modern confidential computing solutions are CPU-based, which is too limiting for compute-intensive workloads like AI and HPC. NVIDIA Confidential Computing is a built-in security feature of the NVIDIA Hopper architecture, making NVIDIA H100 the world's first accelerator with confidential computing capabilities. Users can protect the confidentiality and integrity of their data and applications while benefiting from the unparalleled acceleration of H100 GPUs for AI workloads. A hardware-based trusted execution environment (TEE) is created that protects and isolates the entire workload. This is executed on a single H100 GPU, multiple H100 GPUs within a node, or individual MIG instances. GPU-accelerated applications can run unmodified in TEE and do not need to be partitioned. Users can combine the power of NVIDIA software for AI and HPC with the security of a hardware root-of-trust application offered by NVIDIA Confidential Computing.
Converged accelerator H100 CNX from NVIDIA
Converged accelerator H100 CNX from NVIDIA
NVIDIA H100 CNX combines the performance of the NVIDIA H100 with the advanced networking capabilities of the NVIDIA ConnectX®-7 Smart Network Interface Card (SmartNIC) in a single, unique platform. This convergence delivers unprecedented performance for GPU-driven input/output (IO)-intensive workloads, such as distributed AI training in the enterprise data center and 5G processing at the edge.
LEARN MORE ABOUT NVIDIA H100 CNX
Grace Hopper
Grace Hopper
The Hopper H100 tensor core GPU will support the NVIDIA Grace Hopper CPU + GPU architecture, specifically designed for terabyte-scale accelerated computing, delivering 10x higher performance in AI and HPC at large models. The NVIDIA Grace CPU leverages the flexibility of the Arm® architecture to create a CPU and server architecture designed from the ground up for accelerated computing. H100 combines with Grace as well as NVIDIA's ultra-fast chip-to-chip interconnect to deliver 900 GB/s bandwidth, 7x faster than 5th generation PCIe. This innovative design delivers up to 30 times the total bandwidth of the fastest servers currently available and up to 10 times the performance for multi-terabyte applications.
Technical data
Form factor | H100 SXM | H100 PCIe |
---|---|---|
FP64 | 30 teraFLOPS | 24 teraFLOPS |
FP64 tensor core | 60 teraFLOPS | 48 teraFLOPS |
FP32 | 60 teraFLOPS | 48 teraFLOPS |
TF32 Tensor Core | 1.000 teraFLOPS* | 500 teraFLOPS | 800 teraFLOPS* | 400 teraFLOPS |
BFLOAT16 tensor core | 2.000 teraFLOPS* | 1,000 teraFLOPS | 1.600 teraFLOPS* | 800 teraFLOPS |
FP16 tensor core | 2.000 teraFLOPS* | 1,000 teraFLOPS | 1.600 teraFLOPS* | 800 teraFLOPS |
FP8 Tensor Core | 4.000 teraFLOPS* | 2,000 teraFLOPS | 3.200 teraFLOPS* | 1,600 teraFLOPS |
INT8 tensor core | 4.000 TOPS* | 2,000 TOPS | 3.200 TOPS* | 1,600 TOPS |
GPU memory | 80 GB | 80 GB |
GPU memory bandwidth | 3 TB/s | 2 TB/s |
Decode r | 7 NVDEC7 JPEG | 7 NVDEC7 JPEG |
Max. Thermal Design Power (TDP) | 700 W | 350 W |
Multi-instance GPUs | Up to 7 MIGs with 10 GB each | Up to 7 MIGs with 10 GB each |
Form factor | SXM | PCIe |
Connectivity | NVLink: 900 GB/s PCIe Gen5: 128 GB/s | NVLINK: 600 GB/s PCIe Gen5: 128 GB/s |
Server options | NVIDIA HGX™ H100 Partners and NVIDIA-Certified Systems™ with 4 or 8 GPUs, NVIDIA DGX™ H100 with 8 GPUs | Partners and NVIDIA-Certified Systems with 1-8 GPUs |
* With low density
Preliminary specifications, may be subject to change
OUR PRODuCT RECOMMENDATIONS
Editable editable, click me to edit, editable, click me to edit, editable, click me to edit ...
Editable editable, click me to edit, editable, click me to edit, editable, click me to edit ...
Editable editable, click me to edit, editable, click me to edit, editable, click me to edit ...