A leap of scale for accelerated computing

With the NVIDIA H100 tensor-core GPU, you'll benefit from unprecedented performance, scalability, and security for any workload. With the NVIDIA® NVLinkSwitch® switch system, up to 256 H100s can be connected to accelerate exascale workloads, while the dedicated Transformer Engine supports trillion-parameter language models. H100 taps into innovations in NVIDIA Hopper™ architecture to deliver industry-leading conversational AI and accelerate large speech models up to 30x over the previous generation.
Up to 9 times faster AI training for the largest models
Mixture of Experts (395 billion parameters)
Projected performance subject to change. Training Mixture of Experts (MoE) Transformer Switch-XXL variant with 395B parameters on 1T token dataset | A100 cluster: HDR IB network | H100 cluster: NVLINK Switch System, NDR IB

Transformational AI Training

NVIDIA H100 GPUs feature fourth-generation Tensor compute units and the Transformer Engine with FP8 precision, which provides up to 9 times faster training compared to the previous generation for Mixture of Experts (MoE) models. The combination of fourth-generation NVlink, which provides 900 gigabytes per second (GB/s) GPU-to-GPU connectivity; NVSwitch, which accelerates collective communication through each GPU across nodes; 5th-generation PCIe; and NVIDIA Magnum IO™ software provides efficient scalability from small enterprises to massive, unified GPU clusters.

Deploying H100 GPUs at data center scale delivers unprecedented performance and the next generation of exascale high-performance computing (HPC) and trillion-parameter AI for all researchers.

Real-Time Deep Learning Inference

AI solves a wide range of business challenges with an equally wide range of neural networks. A superior AI inference accelerator must provide not only the highest performance, but also the versatility to accelerate these networks.

H100 further extends NVIDIA's market-leading position in inference with several advances that accelerate inference by up to 30x and provide the lowest latency. Fourth-generation Tensor compute units accelerate all precisions, including FP64, TF32, FP32, FP16, as well as INT8, and the Transformer Engine uses FP8 and FP16 together to reduce memory usage and increase performance while maintaining accuracy for large language models.

Up to 30 times higher AI inference performance for the largest models
Megatron Chatbot Inference (530 Billion Parameters)
Projected performance subject to change. Inference on Megatron 530B parameter model based chatbot for input sequence length=128, output sequence length =20 | A100 cluster: HDR IB network | H100 cluster: NVLink Switch System, NDR IB
Up to 7 times higher performance for HPC applications
Projected performance subject to change. 3D FFT (4K^3) throughput | A100 cluster: HDR IB network | H100 cluster: NVLink Switch System, NDR IB | Genome Sequencing (Smith-Waterman) | 1 A100 | 1 H100

Exascale High-Performance Computing

The NVIDIA data center platform consistently delivers performance gains that go beyond Moore's Law. H100's new breakthrough AI capabilities further amplify the power of HPC + AI to accelerate time to discovery for scientists and researchers working to solve the world's most important challenges.

H100 triples the floating-point operations per second (FLOPS) of Tensor Cores with twice the accuracy, delivering 60 teraFLOPS of FP64 computing for HPC. AI-powered HPC applications can leverage H100's TF32 precision to achieve petaFLOPS throughput for single-precision matrix multiplication operations, without code changes.

H100 also features DPX instructions, which provide 7x performance over NVIDIA A100 Tensor Core GPUs for dynamic programming algorithms such as Smith-Waterman for DNA sequence alignment, and 40x acceleration over traditional servers with dual-socket CPUs alone.

Data Analytics

Data analytics often take up the majority of time in AI application development. As large data sets are distributed across multiple servers, scale-out solutions with CPU-only standard servers are slowed down by a lack of scalable computing power.

Accelerated servers with H100 deliver the compute power-along with 3 terabytes per second (TB/s) of memory bandwidth per GPU and scalability with NVLink and NVSwitch-to handle data analytics with high performance and scale to support massive data sets. Combined with NVIDIA Quantum-2 Infiniband, Magnum IO software, GPU-accelerated Spark 3.0, and NVIDIA RAPIDS™, the NVIDIA data center platform is capable of accelerating these massive workloads with unmatched levels of performance and efficiency.

Enterprise capacity utilization

IT managers seek to maximize the utilization (both peak and average) of computing resources in the data center. They often use dynamic reconfiguration of compute power to right-size resources for the workloads they are using.

The second generation of MIG in H100 maximizes the utilization of each GPU by securely partitioning them into up to seven separate instances. With confidential computing support, H100 enables secure end-to-end multi-tenant use, ideal for cloud service provider (CSP) environments.

Thanks to H100 with MIG, infrastructure managers can standardize their GPU-accelerated infrastructure while ensuring the flexibility to provision GPU resources with greater granularity to securely provide developers with the right amount of accelerated computing power and optimize the use of all their GPU resources.


MORE INFORMATION ON MIG

NVIDIA Confidential Computing and Security

Modern confidential computing solutions are CPU-based, which is too limiting for compute-intensive workloads like AI and HPC. NVIDIA Confidential Computing is a built-in security feature of the NVIDIA Hopper architecture, making NVIDIA H100 the world's first accelerator with confidential computing capabilities. Users can protect the confidentiality and integrity of their data and applications while benefiting from the unparalleled acceleration of H100 GPUs for AI workloads. A hardware-based trusted execution environment (TEE) is created that protects and isolates the entire workload. This is executed on a single H100 GPU, multiple H100 GPUs within a node, or individual MIG instances. GPU-accelerated applications can run unmodified in TEE and do not need to be partitioned. Users can combine the power of NVIDIA software for AI and HPC with the security of a hardware root-of-trust application offered by NVIDIA Confidential Computing.


LEARN MORE ABOUT CONFIDENTIAL COMPUTING FROM NVIDIA

Converged accelerator H100 CNX from NVIDIA

NVIDIA H100 CNX combines the performance of the NVIDIA H100 with the advanced networking capabilities of the NVIDIA ConnectX®-7 Smart Network Interface Card (SmartNIC) in a single, unique platform. This convergence delivers unprecedented performance for GPU-driven input/output (IO)-intensive workloads, such as distributed AI training in the enterprise data center and 5G processing at the edge.


LEARN MORE ABOUT NVIDIA H100 CNX

Grace Hopper

The Hopper H100 tensor core GPU will support the NVIDIA Grace Hopper CPU + GPU architecture, specifically designed for terabyte-scale accelerated computing, delivering 10x higher performance in AI and HPC at large models. The NVIDIA Grace CPU leverages the flexibility of the Arm® architecture to create a CPU and server architecture designed from the ground up for accelerated computing. H100 combines with Grace as well as NVIDIA's ultra-fast chip-to-chip interconnect to deliver 900 GB/s bandwidth, 7x faster than 5th generation PCIe. This innovative design delivers up to 30 times the total bandwidth of the fastest servers currently available and up to 10 times the performance for multi-terabyte applications.


LEARN MORE ABOUT GRACE

Technical data

Form factor
H100 SXM
H100 PCIe
FP64
30 teraFLOPS
24 teraFLOPS
FP64 tensor core
60 teraFLOPS
48 teraFLOPS
FP32
60 teraFLOPS
48 teraFLOPS
TF32 Tensor Core
1.000 teraFLOPS* | 500 teraFLOPS
800 teraFLOPS* | 400 teraFLOPS
BFLOAT16 tensor core
2.000 teraFLOPS* | 1,000 teraFLOPS
1.600 teraFLOPS* | 800 teraFLOPS
FP16 tensor core
2.000 teraFLOPS* | 1,000 teraFLOPS
1.600 teraFLOPS* | 800 teraFLOPS
FP8 Tensor Core
4.000 teraFLOPS* | 2,000 teraFLOPS
3.200 teraFLOPS* | 1,600 teraFLOPS
INT8 tensor core
4.000 TOPS* | 2,000 TOPS
3.200 TOPS* | 1,600 TOPS
GPU memory
80 GB
80 GB
GPU memory bandwidth
3 TB/s
2 TB/s
Decode
r



7 NVDEC7
JPEG
7 NVDEC7
JPEG
Max. Thermal Design Power (TDP)
700 W
350 W
Multi-instance GPUs
Up to 7 MIGs with 10 GB each
Up to 7 MIGs with 10 GB each
Form factor
SXM
PCIe
Connectivity
NVLink: 900 GB/s PCIe Gen5: 128 GB/s
NVLINK: 600 GB/s PCIe Gen5: 128 GB/s
Server options


NVIDIA HGX™ H100 Partners and
NVIDIA-Certified Systems™ with 4 or 8 GPUs,
NVIDIA DGX™ H100 with 8 GPUs

Partners and NVIDIA-Certified Systems
with 1-8 GPUs

* With low density

Preliminary specifications, may be subject to change

OUR PRODuCT RECOMMENDATIONS