The World’s Most Powerful GPU
The NVIDIA H200 Tensor Core GPU supercharges generative AI and high-performance computing (HPC) workloads with game-changing performance and memory capabilities. As the first GPU with HBM3e, the H200’s larger and faster memory fuels the acceleration of generative AI and large language models (LLMs) while advancing scientific computing for HPC workloads.
NVIDIA Supercharges Hopper, the World’s Leading AI Computing Platform
Based on the NVIDIA Hopper™ architecture, the NVIDIA HGX H200 features the NVIDIA H200 Tensor Core GPU with advanced memory to handle massive amounts of data for generative AI and high-performance computing workloads.
Read Press ReleaseHighlights
Experience Next-Level Performance
Llama2 70B Inference
GPT-3 175B Inference
High-Performance Computing
1.9x Faster
1.6x Faster
110x Faster
Benefits
Higher Performance and Larger, Faster Memory
Based on the NVIDIA Hopper architecture, the NVIDIA H200 is the first GPU to offer 141 gigabytes (GB) of HBM3e memory at 4.8 terabytes per second (TB/s) —that’s nearly double the capacity of the NVIDIA H100 Tensor Core GPU with 1.4X more memory bandwidth. The H200’s larger and faster memory accelerates generative AI and LLMs, while advancing scientific computing for HPC workloads with better energy efficiency and lower total cost of ownership.
Unlock Insights with High-Performance LLM Inference
In the ever-evolving landscape of AI, businesses rely on LLMs to address a diverse range of inference needs. An AI inference accelerator must deliver the highest throughput at the lowest TCO when deployed at scale for a massive user base.
The H200 boosts inference speed by up to 2X compared to H100 GPUs when handling LLMs like Llama2.
Preliminary measured performance, subject to change.
Llama2 13B: ISL 128, OSL 2K | Throughput | H100 1x GPU BS 64 | H200 1x GPU BS 128
GPT-3 175B: ISL 80, OSL 200 | x8 H100 GPUs BS 64 | x8 H200 GPUs BS 128
Llama2 70B: ISL 2K, OSL 128 | Throughput | H100 1x GPU BS 8 | H200 1x GPU BS 32.
Supercharge High-Performance Computing
Memory bandwidth is crucial for HPC applications as it enables faster data transfer, reducing complex processing bottlenecks. For memory-intensive HPC applications like simulations, scientific research, and artificial intelligence, the H200’s higher memory bandwidth ensures that data can be accessed and manipulated efficiently, leading up to 110X faster time to results compared to CPUs
Learn More About High-Performance ComputingProjected performance, subject to change.
HPC MILC- dataset NERSC Apex Medium | HGX H200 4-GPU | dual Sapphire Rapids 8480
HPC Apps- CP2K: dataset H2O-32-RI-dRPA-96points | GROMACS: dataset STMV | ICON: dataset r2b5 | MILC: dataset
NERSC Apex Medium | Chroma: dataset HMC Medium | Quantum Espresso: dataset AUSURF112 | 1x H100 | 1x H200.
Preliminary measured performance, subject to change.
Llama2 70B: ISL 2K, OSL 128 | Throughput | H100 1x GPU BS 8 | H200 1x GPU BS 32
Reduce Energy and TCO
With the introduction of the H200, energy efficiency and TCO reach new levels. This cutting-edge technology offers unparalleled performance, all within the same power profile as the H100. AI factories and supercomputing systems that are not only faster but also more eco-friendly, deliver an economic edge that propels the AI and scientific community forward.
Learn More About Sustainable ComputingPerformance
Perpetual Innovation Brings Perpetual Performance Gains
Single-node HGX measured performance | A100 April 2021 | H100 TensorRT-LLM Oct 2023 | H200 TensorRT-LLM Oct 2023
The NVIDIA Hopper architecture delivers an unprecedented performance leap over its predecessor and continues to raise the bar through ongoing software enhancements with the H100, including the recent release of powerful open-source libraries like NVIDIA TensorRT-LLM™.The introduction of the H200 continues the momentum with more performance. Investment in it ensures performance leadership now, and—with continued improvements to supported software—the future.
Enterprise-Ready: AI Software Streamlines Development and Deployment
NVIDIA AI Enterprise, together with NVIDIA H200, simplifies the building of an AI-ready platform, accelerating AI development and deployment of production-ready generative AI, computer vision, speech AI, and more. Together, they deliver enterprise-grade security, manageability, stability, and support to gather actionable insights faster and achieve tangible business value sooner.
Learn more about Ai enterpriseSpecifications
NVIDIA H200 Tensor Core GPU
Form Factor | H200 SXM¹ |
---|---|
FP64 Tensor Core | 67 TFLOPS |
FP32 | 67 TFLOPS |
TF32 Tensor Core | 989 TFLOPS² |
BFLOAT16 Tensor Core | 1,979 TFLOPS² |
FP16 Tensor Core | 1,979 TFLOPS² |
FP8 Tensor Core | 3,958 TFLOPS² |
INT8 Tensor Core | 3,958 TFLOPS² |
GPU Memory | 141GB |
GPU Memory Bandwidth | 4.8TB/s |
Decoders | 7 NVDEC
7 JPEG |
Max Thermal Design Power (TDP) | Up to 700W (configurable) |
Multi-Instance GPUs | Up to 7 MIGs @16.5GB each |
Form Factor | SXM |
Interconnect | NVIDIA NVLink®: 900GB/s
PCIe Gen5: 128GB/s |
Server Options | NVIDIA HGX™ H200 partner and NVIDIA-Certified Systems™ with 4 or 8 GPUs |
NVIDIA AI Enterprise | Add on |