Sub brand
Back to top


Whether it’s performing object detection in images or video, recommending restaurants, or translating the spoken word, inference is the mechanism that allows applications to derive valuable information from trained AI models. But many inference solutions are one-off designs that lack the performance and flexibility to be seamlessly deployed in modern data center environments.

NVIDIA TensorRT Inference Server lets you leverage inference in your application without needing to reinvent the wheel. Delivered as a ready-to-deploy container from NGC, NVIDIA’s registry for GPU-accelerated software containers, and as an open source project, NVIDIA TensorRT Inference Server is a microservice that enables applications to use AI models in data center production. It maximizes GPU utilization, supports all popular AI frameworks and model types, and provides packaging and documentation for deployment using Kubeflow.

By watching this webinar replay, you'll learn:
  • The internal architecture of TensorRT Inference Server and how it fits into a larger inference workflow with Kubernetes, Kubeflow, and other load balancing and container orchestration solutions;
  • How TensorRT Inference Server maximizes GPU utilization with dynamic batching and concurrent model execution on each GPU; and
  • Best practices for using metrics from TensorRT Inference Server for autoscaling, health, and utilization, with a demonstration using Kubeflow and Google Kubernetes Engine (GKE).



You will receive an email with instructions on how to join the webinar shortly.

Main Content

maincontent goes here


Content goes here


content goes here

main image description


Content goes here


DGX Station Datasheet

Get a quick low-down and technical specs for the DGX Station.
DGX Station Whitepaper

Dive deeper into the DGX Station and learn more about the architecture, NVLink, frameworks, tools and more.
DGX Station Whitepaper

Dive deeper into the DGX Station and learn more about the architecture, NVLink, frameworks, tools and more.


Content goes here


Tripti Singhal

Product Manager, NVIDIA

Tripti Singhal is a product manager on the NVIDIA Deep Learning Software team working on TensorRT Inference Server. She was a Deep Learning Solutions Architect at NVIDIA prior to moving to the product team and received her bachelor's degree in Computer Science at University of California, Santa Barbara.

Add Presenter 2's Name (John Smith)
Add Presenter 2's Title (ex: CMO, ABC Company)

Add Presenter 2's Bio (2-3 Sentences)

Add Presenter 3's Name (John Smith)
Add Presenter 3's Title (ex: CMO, ABC Company)

Add Presenter 3's Bio (2-3 Sentences)

Add Presenter 4's Name (John Smith)
Add Presenter 4's Title (ex: CMO, ABC Company)

Add Presenter 4's Bio (2-3 Sentences)
Text here
Text here

Other Speakers


Job Title.
Name 2

Job Title.
Name 3

Job Title.

Content Title

Content here


Webinar: Description here

Date & Time: Wednesday, April 22, 2018