Training Multi-Billion Parameter Language Models with Megatron

Date: October 17, 2019
Time: 9:00am – 10:00am PDT
Duration: 1 hour

Recent work such as BERT and GPT-2 in unsupervised language modeling demonstrates that training large neural language models advances the state of the art in Natural Language Processing applications. However, for very large models, memory constraints limit the size of models that can be practically trained. Model parallelism allows us to train larger models, because the parameters can be split across multiple processors. Designing such an approach in a simple and efficient way still remains a challenge.

Recently, NVIDIA Research launched project Megatron to enable training state of the art transformer language models with billions of parameters. Join this webinar to learn how NVIDIA researchers created Megatron, the largest Transformer language model ever trained with 8.3 billion parameters at 24x the size of BERT and 5.6x the size of GPT-2. Trained on 174GB of text, this model establishes new state-of-the-art results in tasks such as LAMBADA which tests the ability to model long-term dependencies.

By attending this webinar, you'll learn:

Understand implementation of efficient model parallelism without any new compiler or model re-writing

Learn the pre-training process for Megatron

Know about zero-shot evaluations using Megatron

Receive an overview of the PyTorch code used to train this model

Grasp the scope of Megatron’s future uses

WEBINAR REGISTRATION

(Optional) You can unsubscribe at any time.

Send me the latest news, announcements, and more from NVIDIA about:

Here are the options for the enterprise opt-in topics:

Here are the options for the developer opt-in topics:

ncid:

Holding Field - Ent Opt-In Sentence:

Holding Field - Dev Opt-In Sentence:

Holding Field - Ent Opt-In Source:

Holding Field - Dev Opt-In Source:

RF_ConfidenceDescription:

RF_ConfidenceLevel:

RF_SITE_Company:

RF_SITE_State:

RF_SITE_City:

RF_SITE_Country:

RF_SITE_Phone:

THANK YOU FOR REGISTERING FOR THE WEBINAR

You will receive an email with instructions on how to join the webinar shortly.

Speaker

Dr. Mohammad Shoeybi

Senior Research Scientist, NVIDIA

Dr. Mohammad Shoeybi is a Senior Research Scientist in Applied Deep Learning Research group at NVIDIA. His interests are in Natural Language Processing (NLP) applications, unsupervised learning, and large scale language modeling. Prior to NVIDIA, Mohammad worked at DeepMind and Baidu leading efforts on Deep Learning for speech synthesis, text to speech, and recommender systems.

Presenter 2 Bio

Presenter 3 Bio

Job Title 4

Presenter 4 Bio

Other Speakers

Name1

Job Title.

Name 2

Job Title.

Name 3

Job Title.

Training Multi-Billion Parameter Language Models with Megatron

WEBINAR REGISTRATION

THANK YOU FOR REGISTERING FOR THE WEBINAR

Main Content

Content

Content

Content

Content

Content

Speaker

Dr. Mohammad Shoeybi

Job Title 4

Other Speakers

Content Title

Register