FASTER AI. LOWER COSTS.

Demand for increasingly sophisticated AI-enabled services such as image and speech recognition, natural language processing, visual search, and personalized recommendations is exploding. At the same time, data sets are growing larger, networks are becoming more complex, and latency requirements are becoming more stringent to meet user expectations.

NVIDIA's inference platform delivers the performance, efficiency, and responsiveness critical to the delivery of next-generation AI products and services - in the cloud, in the data center, in the network perimeter, and in autonomous machines.

HARNESS THE FULL POTENTIAL OF NVIDIA GRAPHICS PROCESSORS WITH NVIDIA TENSORRT

TensorRT is a high-performance inference platform that is critical to unlocking the power of NVIDIA Tensor compute units. Compared to CPU-only platforms, it provides up to 40x more throughput and minimizes latency. With TensorRT, you can launch from any framework and quickly optimize, validate, and deploy trained neural networks in production.

POWERFUL, UNIFIED AND SCALABLE
DEEP LEARNING INFERENCE

With a single unified architecture, neural networks can be trained on any deep learning framework, using NVIDIA TensorRT optimized and then used for real-time inference in the periphery. With NVIDIA DGX™ Systems, NVIDIA Tensor Core GPUs , NVIDIA Jetson™, and NVIDIA DRIVE™, NVIDIA provides an end-to-end, fully scalable platform for Deep Learning, as demonstrated in theMLPerf benchmark suiteto be seen.

EASIER DEPLOYMENT WITH THE NVIDIA TRITON INFERENCE SERVER

The NVIDIA Triton Inference Server, previously known as TensorRT Inference Server, is open source software that simplifies the deployment of deep learning models in production. With Triton Inference Server, teams can deploy trained AI models from any framework (TensorFlow, PyTorch, TensorRT Plan, Caffe, MXNet, or custom) from on-premises storage, Google Cloud Platform, or AWS S3 to any GPU- or CPU-based infrastructure. Triton Inference Server runs multiple models simultaneously on a single GPU to maximize utilization and integrates with Kybernetes for orchestration, metrics, and auto-scaling.

ENORMOUS COST SAVINGS

To ensure maximum server productivity, data center managers must carefully balance performance and efficiency. A single NVIDIA Tesla T4 server can replace multiple off-the-shelf CPU servers for deep learning inference applications and services, reducing power requirements and providing savings in acquisition and operating costs