Explore NVIDIA’s approach to high-performance, revenue-generating AI infrastructure.
‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌  ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ ‌ 
NVIDIA Logo
inference-email-header-sa-external-think-smart-1360x400.png
Think SMART Inference Signals: Latest News
Moving Beyond Single-Node to Multi-Node Inference
think-smart-monthly-blog-600x600.jpg
NVIDIA Dynamo Scales and Streamlines Data Center Inference with Kubernetes
As AI inference becomes increasingly distributed, the combination of Kubernetes, NVIDIA Dynamo, and NVIDIA Grove greatly simplifies how developers build and scale intelligent applications. NVIDIA Dynamo now integrates with managed Kubernetes services from Amazon EKS, Microsoft Azure AKS, Google Cloud GKE, and Oracle Cloud Infrastructure OKE to get started with large-scale inference.
Read Blog
Top Inference Highlights
newsletter-ms-ignite-blog-3521966-600x338.png
Breaking the Million-Token Barrier: The Business Impact of Azure's NVIDIA GB300 Performance for Enterprise AI
Microsoft Azure achieved 1,100,948 tokens/sec on ND GB300 v6 racks powered by 72 NVIDIA Blackwell Ultra GPUs, validated by Signal65. This benchmark highlights how enterprise AI can deliver record throughput with ~2.5x better power efficiency, combining high performance, operational efficiency, and governance-ready scale.

Read Blog Post 
Published November 3, 2025
newsletter-inference-think-smart-signals-jhh-600x338.png
NVIDIA Extreme Co-Design Delivers X-Factors on One-Year Rhythm
How do you get 10x the performance with only twice the transistors? Extreme co-design. At GTC DC, NVIDIA CEO Jensen Huang showed how the NVIDIA GB200 NVL72 architecture delivers a massive leap in inference performance—creating the lowest-cost AI tokens in the world while driving 10x higher throughput.

Watch Keynote Chapter 
Published October 28, 2025
Latest Inference News and Resources
gb200-nvl-rack-gtc24-content-600x3382.png
Barron’s Highlights NVIDIA’s Inference Leadership
Barron’s explores NVIDIA’s inference leadership with the NVIDIA GB200 NVL72 sweeping the SemiAnalysis InferenceMAX v1 benchmarks, delivering unmatched performance, efficiency, and ROI for AI inference.

Read Article 
Published October 15, 2025
newsletter-ngc-community-model-confidential-model-600x338.png
The Next Platform on Software Pushes the AI Pareto Frontier More Than Hardware
The Next Platform details how NVIDIA software optimizations are boosting performance by 5–10x on the same hardware. Pareto curves illustrate how hardware and software optimizations can boost AI inference performance.

Read Article 
Published October 21, 2025
nv-ai-newsletter-enter-google-cloud-600x338.png
Google Cloud Now Shipping A4X Max, Vertex AI Training, and More
Google Cloud's new A4X Max VMs, powered by NVIDIA GB300 NVL72 systems, are now in preview. A4X Max is designed for training and low-latency AI inference of frontier reasoning models. Further integration with GKE Inference Gateway and NVIDIA NeMo™ Guardrails enables prefix-aware load balancing, significantly improving latency and throughput.

Read Article 
Published October 28, 2025
newsletter-industrial-gtc25-paris-siemens-600x338.png
Siemens Builds and Deploys Self-Contained, Sustainable, and Cost-Effective LLM
Siemens details its efforts to build a future-proof AI ecosystem and provide services for its internal developers. Its sovereign AI infrastructure focuses on data privacy, compliance, cost predictability, and customization—served using vLLM and powered by NVIDIA H200 Tensor Core GPUs and L40S GPUs.

Read Article 
Published October 13, 2025
NVIDIA Inference Technology Highlights
newsletter-inference-nvidia-dynamo-600x338
Streamline Complex AI Inference on Kubernetes With NVIDIA Grove
NVIDIA Grove, a Kubernetes API for running modern machine learning inference workloads on Kubernetes clusters, is now available within NVIDIA Dynamo as a modular component for unified inference management. Grove is fully open source and enables multimode disaggregated serving through multilevel autoscaling, system-level lifecycle management, flexible gang scheduling, topology-aware scheduling, and role-aware orchestration.

Read Blog 
Published November 10, 2025
newsletter-fsi-kv-trade-center-3667173-600x338.png
STAC-ing Wins: NVIDIA GH200 Superchip Sets Records on Financial Services Industry Benchmarks
STAC audited a STAC-ML Markets (Inference) benchmark on a stack featuring NVIDIA GH200 Grace Hopper™ Superchip on Supermicro. Compared to the previous FPGA-based record, GH200 delivered up to 49% lower latency on large models, 44% higher energy efficiency, 8–13x lower inference error rates, and latency as low as 4.67 μs (99p).

Read Blog 
Published October 28, 2025
newsletter-dgx-scale-ai-run-ai-kv-600x338.jpg
Streamline AI Infrastructure With NVIDIA Run:ai on Microsoft Azure
NVIDIA Run:ai integrates with Azure Kubernetes Service (AKS) to dynamically manage GPU resources, allowing multiple workloads to share GPUs efficiently and supporting multi-node and multi-GPU training jobs.

Read Blog 
Published October 30, 2025
newsletter-nv-oracle-logo-lockup-600x338.png
NVIDIA and Oracle to Accelerate Enterprise AI and Data Processing
Oracle announced a new OCI Zettascale10 computing cluster powered by the NVIDIA Blackwell platform, designed for AI training and inference workloads. The cluster will deliver up to 16 zettaFLOPS of AI compute and utilize NVIDIA Spectrum-X™ Ethernet, which enables hyperscalers to interconnect millions of NVIDIA GPUs.

Read Blog 
Published October 14, 2025
newsletter-mixture-of-experts-3105601-600x338.png
Scaling Large MoE Models With Wide Expert Parallelism on NVL72 Rack-Scale Systems
NVIDIA TensorRT™-LLM's Wide Expert Parallelism (Wide-EP) on NVIDIA GB200 NVL72 systems achieves up to 1.8x higher per-GPU throughput compared to smaller EP configurations—improving tokens per second per GPU and lowering costs to serve serving reasoning models such as DeepSeek-R1.

Read Blog 
Published October 20, 2025
newsletter-think-smart-ai-factory-alternative-600x338.png
Nebius Scales AI Inference in the Cloud, Powered by NVIDIA
Using managed Kubernetes with auto-scaling, Nebius optimizes its AI cloud to deliver multi-node training and inference of frontier models and AI applications for startups and enterprises. Nebius, an ecosystem partner for NVIDIA Dynamo, is enabling AI inference at scale with NVIDIA infrastructure.

Watch Video 
Published November 10, 2025
Spotlight: Pascal Bornet
newsletter-inference-think-smart-pascal-bornet-600x600.png
Why Your $5 Million AI Investment Could Generate $75 Million—If You Understand Inference
AI pioneer Pascal Bornet sits down with Dion Harris, Sr. Director of HPC, Cloud, and AI Infrastructure Solutions GTM at NVIDIA, to discuss AI inference, reasoning models, and how performance and efficiency are the driving factors to maximize return on investment from AI factories.

Watch Interview 
Read LinkedIn Post 
Published November 10, 2025
Resources
Follow Us
Facebook   Twitter   YouTube   nv-social_icons-in.png   LinkedIn   NVIDIA Blog
Privacy CenterManage Preferences | UnsubscribeContact Us | View Online
© 2025 NVIDIA Corporation. All rights reserved.
NVIDIA Corporation, 2788 San Tomas Expressway Santa Clara, CA 95051.