Maximizing GPU Utilization for Data Center Inference with NVIDIA TensorRT Inference Server on GKE with Kubeflow

Whether it’s performing object detection in images or video, recommending restaurants, or translating the spoken word, inference is the mechanism that allows applications to derive valuable information from trained AI models. But many inference solutions are one-off designs that lack the performance and flexibility to be seamlessly deployed in modern data center environments.

NVIDIA TensorRT Inference Server lets you leverage inference in your application without needing to reinvent the wheel. Delivered as a ready-to-deploy container from NGC, NVIDIA’s registry for GPU-accelerated software containers, and as an open source project, NVIDIA TensorRT Inference Server is a microservice that enables applications to use AI models in data center production. It maximizes GPU utilization, supports all popular AI frameworks and model types, and provides packaging and documentation for deployment using Kubeflow.

By watching this webinar replay, you'll learn:

The internal architecture of TensorRT Inference Server and how it fits into a larger inference workflow with Kubernetes, Kubeflow, and other load balancing and container orchestration solutions;
How TensorRT Inference Server maximizes GPU utilization with dynamic batching and concurrent model execution on each GPU; and
Best practices for using metrics from TensorRT Inference Server for autoscaling, health, and utilization, with a demonstration using Kubeflow and Google Kubernetes Engine (GKE).

ONDEMAND WEBINAR REGISTRATION

THANK YOU FOR REGISTERING FOR THE WEBINAR

You will receive an email with instructions on how to join the webinar shortly.

Content

DGX Station Datasheet

Get a quick low-down and technical specs for the DGX Station.

DGX Station Whitepaper

Dive deeper into the DGX Station and learn more about the architecture, NVLink, frameworks, tools and more.

Speaker

Tripti Singhal

Product Manager, NVIDIA

Tripti Singhal is a product manager on the NVIDIA Deep Learning Software team working on TensorRT Inference Server. She was a Deep Learning Solutions Architect at NVIDIA prior to moving to the product team and received her bachelor's degree in Computer Science at University of California, Santa Barbara.

Add Presenter 2's Name (John Smith)
Add Presenter 2's Title (ex: CMO, ABC Company)

Add Presenter 2's Bio (2-3 Sentences)

Add Presenter 3's Name (John Smith)
Add Presenter 3's Title (ex: CMO, ABC Company)

Add Presenter 3's Bio (2-3 Sentences)

Add Presenter 4's Name (John Smith)
Add Presenter 4's Title (ex: CMO, ABC Company)

Add Presenter 4's Bio (2-3 Sentences)

Text here

Other Speakers

Name1

Job Title.

Name 2

Job Title.

Name 3

Job Title.

ONDEMAND WEBINAR REGISTRATION

THANK YOU FOR REGISTERING FOR THE WEBINAR

Main Content

Content

Content

Content

Content

Content

Speaker

Tripti Singhal

Other Speakers

Content Title

Register