AI & Deep Learning
Sub brand
Back to top

Introduction

Date: October 17, 2019
Time: 9:00am – 10:00am PDT
Duration: 1 hour


Recent work such as BERT and GPT-2 in unsupervised language modeling demonstrates that training large neural language models advances the state of the art in Natural Language Processing applications. However, for very large models, memory constraints limit the size of models that can be practically trained. Model parallelism allows us to train larger models, because the parameters can be split across multiple processors. Designing such an approach in a simple and efficient way still remains a challenge.


Recently, NVIDIA Research launched project Megatron to enable training state of the art transformer language models with billions of parameters. Join this webinar to learn how NVIDIA researchers created Megatron, the largest Transformer language model ever trained with 8.3 billion parameters at 24x the size of BERT and 5.6x the size of GPT-2. Trained on 174GB of text, this model establishes new state-of-the-art results in tasks such as LAMBADA which tests the ability to model long-term dependencies.



By attending this webinar, you'll learn:
  • Understand implementation of efficient model parallelism without any new compiler or model re-writing
  • Learn the pre-training process for Megatron
  • Know about zero-shot evaluations using Megatron
  • Receive an overview of the PyTorch code used to train this model
  • Grasp the scope of Megatron’s future uses

WEBINAR REGISTRATION

THANK YOU FOR REGISTERING FOR THE WEBINAR



You will receive an email with instructions on how to join the webinar shortly.

Main Content

maincontent goes here

Content

Content goes here

Content

content goes here

main image description

Content

Content goes here

Content

DGX Station Datasheet

Get a quick low-down and technical specs for the DGX Station.
DGX Station Whitepaper

Dive deeper into the DGX Station and learn more about the architecture, NVLink, frameworks, tools and more.
DGX Station Whitepaper

Dive deeper into the DGX Station and learn more about the architecture, NVLink, frameworks, tools and more.

Content

Content goes here

Speaker

Dr. Mohammad Shoeybi

Senior Research Scientist, NVIDIA

Dr. Mohammad Shoeybi is a Senior Research Scientist in Applied Deep Learning Research group at NVIDIA. His interests are in Natural Language Processing (NLP) applications, unsupervised learning, and large scale language modeling. Prior to NVIDIA, Mohammad worked at DeepMind and Baidu leading efforts on Deep Learning for speech synthesis, text to speech, and recommender systems.

Presenter 2 Bio

Presenter 3 Bio

Job Title 4

Job Title 4

Presenter 4 Bio

Other Speakers

Name1

Job Title.
Name 2

Job Title.
Name 3

Job Title.

Content Title

Content here

Register

Webinar: Description here

Date & Time: Wednesday, April 22, 2018