FASTER AI. LOWER COSTS.

The AI revolution is in full swing, creating new opportunities for companies to redefine how they deal with customer challenges. It's a future where every customer interaction, product, and service offering is touched and improved by AI.

GPUs have proven amazingly efficient at solving the most complex deep learning problems, and NVIDIA's Deep Learning platform is currently the industry standard training solution.

The potential for artificial intelligence (AI) to help any industry reach a new level of development is greater than ever. From over a billion smart city cameras providing public safety, to the more than $100 billion lost annually to retail theft, to the 500 million calls per day in contact centers. The demand for AI to meet these needs is enormous. Inferencing is key to making consumers' lives more convenient, preventing lost sales, and driving operational efficiencies as we move toward an AI economy.

However, developing inference solutions from concept to deployment is not easy.
Many individual and disparate components must work in harmony to achieve a successful inference deployment. For example, model selection, application constraints, framework training and optimization, deployment strategy, processor target, and orchestration and management middleware. The lack of a unified workflow for all these areas of the inference equation presents an obstacle for enterprises and cloud service providers (CSPs) when it comes to meeting massive inference needs.


NVIDIA's inference platform delivers the performance, efficiency, and responsiveness critical to delivering next-generation AI products and services - in the cloud, in the data center, in the network perimeter, and in autonomous machines
Nvidia Tensor Image

HARNESS THE FULL POTENTIAL OF NVIDIA GRAPHICS PROCESSORS WITH NVIDIA TENSORRT

TensorRT is a high-performance inference platform that is critical to unlocking the power of NVIDIA Tensor compute units. Compared to CPU-only platforms, it provides up to 40x more throughput and minimizes latency. With TensorRT, you can launch from any framework and quickly optimize, validate, and deploy trained neural networks in production.

EASIER DEPLOYMENT WITH THE NVIDIA TRITON INFERENCE SERVER

The NVIDIATriton Inference Server, previously known as TensorRT Inference Server, is open source software that simplifies the deployment of deep learning models in production. With Triton Inference Server, teams can deploy trained AI models from any framework (TensorFlow, PyTorch, TensorRT Plan, Caffe, MXNet, or custom) from on-premises storage, Google Cloud Platform, or AWS S3 to any GPU- or CPU-based infrastructure. Triton Inference Server runs multiple models simultaneously on a single GPU to maximize utilization and integrates with Kybernetes for orchestration, metrics, and auto-scaling.
Nvidia Triton Inference Server Image
Deep Learning Image

POWERFUL, UNIFIED AND SCALABLE DEEP LEARNING INFERENCE

With a single unified architecture, neural networks can be trained on any deep learning framework, optimized with NVIDIA TensorRT optimized, and then deployed for real-time inference in the periphery. With NVIDIA DGX™ SystemsNVIDIA Tensor Core GPUsNVIDIA Jetson™ and NVIDIA DRIVE™, NVIDIA provides an end-to-end, fully scalable platform for Deep Learning, as shown in the MLPerf benchmark suite to be seen.


ENORMOUS COST SAVINGS

To ensure maximum server productivity, data center managers must carefully balance performance and efficiency. A single NVIDIA Tesla T4 server can replace multiple off-the-shelf CPU servers for deep learning inference applications and services, reducing power requirements and providing savings in acquisition and operating costs.
Data Server Image
NVIDIA AI provides a complete end-to-end stack and suite of products and services to deliver the performance, efficiency, and responsiveness critical to next-generation AI inference - in the cloud, in the data center, at the network edge, or in embedded devices. NVIDIA AI is a combination of architectural innovation designed specifically to accelerate AI inference workloads and an end-to-end software stack designed for data scientists, software developers, and infrastructure engineers involved in various stages of the process from prototyping to production, with varying levels of AI expertise and experience.

NVIDIAs updates to the GPU product portfolio and stack offerings, including TensoRT and Triton™ Inference Server, extend our leadership in the quest to deliver optimized, end-to-end inference solutions for the cloud, data center and edge. The NVIDIA AI solution stack and updates include:

NVIDIA Train, Adapt, and Optimize (TAO), a zero-code solution for AI model creation. With a user interface and guided workflow, TAO enables developers to train, adapt, and optimize pre-trained AI models for computer vision and conversation for their use case in a fraction of the time, with just a few clicks, and without AI expertise or large datasets.

● NVIDIA TensorRT, an SDK for high-performance deep learning inference that includes an inference optimizer and runtime environment, enabling AI developers to import trained models from all major deep learning frameworks and optimize them for use in the cloud, data center, and edge.
The latest version 8.2 includes new optimizations for running language models with billions of parameters, such as T5 and GPT, in real time, as well as integration with PyTorch and TensorFlow. With this integration, millions of developers can achieve three times faster inference performance with just one line of code.

● NVIDIA Triton Inference Server, which simplifies production-scale AI model deployment.
As open-source inference software, Triton Inference Server enables teams to deploy trained AI models from any framework to local storage or a cloud platform using any GPU- or CPU-based infrastructure (cloud, data center, or edge). the latest Triton release includes the following enhancements to further optimize inference performance with NVIDIA AI:
Model Analyzer helps determine optimal model execution parameters (accuracy, stack size, number of concurrent model instances, and client requests) given latency, throughput, and memory constraints.

  • Support for the RAPIDS Forest Inference Library (FIL) backend for executing inference on tree-based models (gradient boosted decision trees and random forests).
  • Support for distributed inference with multiple GPUs and nodes for giant transformer-based language models such as GPT-3.
  • Availability in Amazon SageMaker, allowing Triton to be used for deploying models in the SageMaker AI platform.
  • Triton is also now available in all major cloud platforms.
NVIDIA-certified systems enable enterprises to deploy hardware solutions that run their modern accelerated workloads and NVIDIA AI securely and optimally. They combine NVIDIA GPUs and NVIDIA networking in servers from leading NVIDIA partners in validated, optimized configurations. These servers are validated for performance, manageability, security, and scalability, and are backed by enterprise-grade support from NVIDIA and our partners. Certification now includes two new categories for edge systems: enterprise edge for servers in controlled environments and industrial edge for systems in industrial or harsh environments. With an NVIDIA-certified system, enterprises can confidently choose GPU-accelerated solutions to power their inference workloads, no matter where they run.
VR Demo Image
NVIDIA AI Enterprise is an end-to-end, cloud-native suite of AI and data science tools and frameworks optimized and certified by NVIDIA to run exclusively on VMware vSphere with mainstream NVIDIA-certified systems. NVIDIA AI Enterprise, licensed and supported by NVIDIA, includes key NVIDIA technologies and software for rapidly deploying, managing and scaling AI and inference workloads in the modern hybrid cloud. The NVIDIA TensorRT SDK and Triton Inference Server are both available as part of the NVIDIA AI Enterprise Suite.
Nvidia Ai End to End Image

COMPLETE INFERENCE PORTFOLIO

NVIDIA offers a complete portfolio of NVIDIA-certified systems using Ampere and Hopper Tensor Core GPUs as the inference engine for NVIDIA AI. The introduction of the A2 Tensor Core GPUs a30 expands the NVIDIA AI portfolio, which already includes the H100A100 and A30 Tensor Core GPUs, with an entry-level inference engine in a low-profile form factor. With a low power consumption of up to 40W, A2 fits into any server, making it ideal for far-edge servers. The A100 provides the highest inference performance at any scale for compute-intensive applications, and the A30 provides optimal inference performance for mainstream servers. NVIDIA-certified systems with the NVIDIA H100, A100, A30, and A2 tensor core GPUs deliver leading inference performance in the cloud, data center, and edge, ensuring that AI-enabled applications can be deployed with fewer servers and less power consumption, resulting in faster insights at a significantly lower cost.

INFERENCE SOLUTIONS

Data center

Self driving cars

Intelligent video analysis

Embedded systems