FASTER AI. LOWER COSTS.

Demand for increasingly sophisticated AI-enabled services such as image and speech recognition, natural language processing, visual search and personalised recommendations is exploding. At the same time, datasets are getting larger, networks are becoming more complex and latency requirements are becoming more stringent to meet user expectations.

NVIDIA's inference platform delivers the performance, efficiency and responsiveness critical to delivering next-generation AI products and services - in the cloud, in the data centre, in the network perimeter and in autonomous machines.

HARNESS THE FULL POTENTIAL OF NVIDIA GRAPHICS PROCESSORS WITH NVIDIA TENSORRT

TensorRT is a high-performance inference platform that is critical to unlocking the power of NVIDIA Tensor compute units. Compared to CPU-only platforms, it offers up to 40 times more throughput and minimises latency. With TensorRT, you can launch from any framework and quickly optimise, validate and deploy trained neural networks in production.

EASIER DEPLOYMENT WITH THE NVIDIA TRITON INFERENCE SERVER

NVIDIA Triton Inference Server, previously known as TensorRT Inference Server, is open source software that simplifies the deployment of deep learning models in production. With Triton Inference Server, teams can deploy trained AI models from any framework (TensorFlow, PyTorch, TensorRT Plan, Caffe, MXNet, or custom) from on-premises storage, Google Cloud Platform, or AWS S3 to any GPU- or CPU-based infrastructure. Triton Inference Server runs multiple models simultaneously on a single GPU to maximise utilisation and integrates with Kybernetes for orchestration, metrics and auto-scaling.

POWERFUL, CONSISTENT AND
SCALABLE DEEP LEARNING INFERENCE

​​​​​​​With a single unified architecture, neural networks can be trained on any deep learning framework, optimised with NVIDIA TensorRT  and then deployed for real-time inference in the periphery. With NVIDIA DGX™ Systems , NVIDIA Tensor Core GPUs ,  NVIDIA Jetson™  and NVIDIA DRIVE™​​​​​​​, NVIDIA provides an end-to-end, fully scalable platform for Deep Learning, as seen in the MLPerf benchmark suite.

ENORMOUS COST SAVINGS

To ensure maximum server productivity, data centre managers must carefully balance performance and efficiency. A single NVIDIA Tesla T4 server can replace multiple off-the-shelf CPU servers for deep learning inference applications and services, reducing power requirements and providing savings in acquisition and operating costs.

Data Center

SELF-DRIVING CARS

INTELLIGENT VIDEO ANALYSIS

INCLUDED
​​​​​​​EQUIPMENT