Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Megatron GPT model finetuning #6210

Merged
merged 87 commits into from
Apr 6, 2023
Merged

Conversation

MaximumEntropy
Copy link
Contributor

What does this PR do ?

Adds the ability to fine-tune Megatron GPT Models.

Collection: NLP

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

soares-f and others added 30 commits December 19, 2022 06:42
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
pre-commit-ci bot and others added 3 commits April 3, 2023 23:36
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@github-actions github-actions bot added the CI label Apr 4, 2023
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Copy link
Collaborator

@yidong72 yidong72 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loos good. left some comments.

return model


def load_from_checkpoint_dir(cls, cfg, trainer, modify_confg_fn):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or we can put this into a utility function. It is used a lot in other places to load from checkpoint dir

text = self.prompt_template.replace('{input}', original_context).replace('{output}', output)

if self.separate_prompt_and_response_with_newline and self.prompt_template is None:
text = context + '\n' + output
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use user provided separators?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the prompt_template should cover this case right?

if self.prompt_template is not None:
import ipdb

ipdb.set_trace()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the debug statement?

from argparse import ArgumentParser
from multiprocessing import Pool

from sacremoses import MosesDetokenizer
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it part of the plan to release NIV and T0 data preprocessing scripts? We would like others to SFT GPT with the same instruction dataset?

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
@okuchaiev okuchaiev requested a review from arendu April 6, 2023 21:32
Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Copy link
Collaborator

@ericharper ericharper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@ericharper ericharper merged commit 714eded into main Apr 6, 2023
@ericharper ericharper deleted the sandeepsub/gpt_sft_stable_rebase_main branch April 6, 2023 23:08
hsiehjackson pushed a commit to hsiehjackson/NeMo that referenced this pull request Jun 2, 2023
* copy from sft_from_gpt

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Changed tokenization and example

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* maybe remove (got from upstream)

* Eval metrics while finetuning

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add missing args

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add arg

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Wrap in try except

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Try fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add separate validation and test batch sizes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add assert

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix checkpoint name

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Explict sampling args

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update t0 script

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add niv2 script

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Change workers

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix labels

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Ignore download

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add dist opt support

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Allow skipping validation

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix tokenization and padding to max batch

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Adds several configurable flags for Megatron GPT models (NVIDIA#5991)

* Initial

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Multiple fixes

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add to CI test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* check position embs for gpt prompt learning

Signed-off-by: Adi Renduchintala <adithya.r@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Update args

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Disable tts unit test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Empty

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Update Jenkinsfile

Changed optimizer for GPT training from 'fused_adam' to 'distributed_fused_adam'.

Signed-off-by: khcs <khcs@users.noreply.github.com>

* update config to to use correct key

Signed-off-by: ericharper <complex451@gmail.com>

* revert Jenkinsfile back to fused_adam

Signed-off-by: ericharper <complex451@gmail.com>

---------

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: khcs <khcs@users.noreply.github.com>
Signed-off-by: ericharper <complex451@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: ericharper <complex451@gmail.com>

* Fast glu activations (NVIDIA#6058)

* fast glu activations

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Clean up activation list

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Explicitly check for united embeddings when logging params (NVIDIA#6085)

* Explicitly check for united embeddings

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Option for model extracted dir

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Add index mapping dir

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Assistant prompt

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Remove ipdb

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Override dropout

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Change sampler

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Roll back again

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Revert TTS

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Reset TTS

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Revert further

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Revert more to main

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix Test DS

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Address PR comments

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add the option to provide a prompt template via fstrings

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add CI test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* fix ci test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix CI test

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Minor

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix CI

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix CI

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix CI

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix workers issue

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

* Fix workers

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>

---------

Signed-off-by: MaximumEntropy <sandeep.subramanian.1@umontreal.ca>
Signed-off-by: Adi Renduchintala <adithya.r@gmail.com>
Signed-off-by: khcs <khcs@users.noreply.github.com>
Signed-off-by: ericharper <complex451@gmail.com>
Co-authored-by: soares-f <soarescmsa@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: khcs <khcs@users.noreply.github.com>
Co-authored-by: Oleksii Kuchaiev <okuchaiev@users.noreply.github.com>
Co-authored-by: ericharper <complex451@gmail.com>
Signed-off-by: hsiehjackson <c2hsieh@ucsd.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants