Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In-framework deployment #9438

Merged
merged 47 commits into from
Jun 13, 2024

Conversation

oyilmaz-nvidia
Copy link
Collaborator

What does this PR do ?

PR for in framework deployment. Took the commits from this PR: #8958 and made a few changes.

jukim-nv and others added 30 commits April 17, 2024 15:42
…elds from the relevant internal classes instead of hard-coding whenever possible
oyilmaz-nvidia and others added 14 commits May 13, 2024 17:31
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
@github-actions github-actions bot removed the NLP label Jun 12, 2024
Copy link
Collaborator

@meatybobby meatybobby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@oyilmaz-nvidia oyilmaz-nvidia merged commit a01fa6d into NVIDIA:main Jun 13, 2024
208 checks passed
galv pushed a commit to galv/NeMo that referenced this pull request Jun 13, 2024
* initial MegatronGPTDeployable class

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete old comment

* first draft of MegatronGPTDeployable test script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* small cleanup of test_triton_deployable.py

* move MegatronGPTDeployable into nlp folder since it is language specific

* update test_triton_deployable for new MegatronGPTDeployable location

* renaming NemoQueryLLM classes

* MegatronGPTDeployable should programatically generate input/output fields from the relevant internal classes instead of hard-coding whenever possible

* add NemoTritonQueryLLMPyTorch class and example

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MegatronGPTModel should always load on creation, also allow number of gpus to be controlled via argument

* got logprobs working, but can only process one prompt at a time

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add nemo deployable to deploy_triton.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* multigpu working, with manual torch.distributed calls

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename MegatronGPTDeployable to MegatronLLMDeployable

* MegatronGPTDeployable->MegatronLLMDeployable rename for filenames

* move torch.distributed calls inside MegatronLLMDeployable

* add constructor for existing model class, tested working with Mistral7B and Nemotron3-22B

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename test_triton_deployable.py to tests_pytriton_deploy.py

* cleanup, comments, and style guide fixes

* add warning for multigpu cases where users will need to be aware of pytorch lightning DDP behavior

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing formatting of logprob outputs

* fix single gpu behavior, and add padding to outputs to allow for multi-prompt logprob calculation

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* fixing codeQL issues

* Apply isort and black reformatting

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* removed min_length definition in previous commit but forgot to remove its use

* update comments and arguments in deploy/nlp/query_llm.py

* Apply isort and black reformatting

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

* delete unused arguments from test_pytriton_deploy.py

* remove some debug prints from megatronllm_deployable

* rename test file due to pytest issue

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Co-authored-by: Justin Kim <jukim@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: jukim-nv <jukim-nv@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
JesusPaz pushed a commit to JesusPaz/NeMo that referenced this pull request Jun 18, 2024
* initial MegatronGPTDeployable class

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete old comment

* first draft of MegatronGPTDeployable test script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* small cleanup of test_triton_deployable.py

* move MegatronGPTDeployable into nlp folder since it is language specific

* update test_triton_deployable for new MegatronGPTDeployable location

* renaming NemoQueryLLM classes

* MegatronGPTDeployable should programatically generate input/output fields from the relevant internal classes instead of hard-coding whenever possible

* add NemoTritonQueryLLMPyTorch class and example

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MegatronGPTModel should always load on creation, also allow number of gpus to be controlled via argument

* got logprobs working, but can only process one prompt at a time

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add nemo deployable to deploy_triton.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* multigpu working, with manual torch.distributed calls

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename MegatronGPTDeployable to MegatronLLMDeployable

* MegatronGPTDeployable->MegatronLLMDeployable rename for filenames

* move torch.distributed calls inside MegatronLLMDeployable

* add constructor for existing model class, tested working with Mistral7B and Nemotron3-22B

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename test_triton_deployable.py to tests_pytriton_deploy.py

* cleanup, comments, and style guide fixes

* add warning for multigpu cases where users will need to be aware of pytorch lightning DDP behavior

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing formatting of logprob outputs

* fix single gpu behavior, and add padding to outputs to allow for multi-prompt logprob calculation

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* fixing codeQL issues

* Apply isort and black reformatting

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* removed min_length definition in previous commit but forgot to remove its use

* update comments and arguments in deploy/nlp/query_llm.py

* Apply isort and black reformatting

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

* delete unused arguments from test_pytriton_deploy.py

* remove some debug prints from megatronllm_deployable

* rename test file due to pytest issue

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Co-authored-by: Justin Kim <jukim@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: jukim-nv <jukim-nv@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
rohitrango pushed a commit to rohitrango/NeMo that referenced this pull request Jun 25, 2024
* initial MegatronGPTDeployable class

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* delete old comment

* first draft of MegatronGPTDeployable test script

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* small cleanup of test_triton_deployable.py

* move MegatronGPTDeployable into nlp folder since it is language specific

* update test_triton_deployable for new MegatronGPTDeployable location

* renaming NemoQueryLLM classes

* MegatronGPTDeployable should programatically generate input/output fields from the relevant internal classes instead of hard-coding whenever possible

* add NemoTritonQueryLLMPyTorch class and example

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* MegatronGPTModel should always load on creation, also allow number of gpus to be controlled via argument

* got logprobs working, but can only process one prompt at a time

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* add nemo deployable to deploy_triton.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* multigpu working, with manual torch.distributed calls

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename MegatronGPTDeployable to MegatronLLMDeployable

* MegatronGPTDeployable->MegatronLLMDeployable rename for filenames

* move torch.distributed calls inside MegatronLLMDeployable

* add constructor for existing model class, tested working with Mistral7B and Nemotron3-22B

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* rename test_triton_deployable.py to tests_pytriton_deploy.py

* cleanup, comments, and style guide fixes

* add warning for multigpu cases where users will need to be aware of pytorch lightning DDP behavior

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fixing formatting of logprob outputs

* fix single gpu behavior, and add padding to outputs to allow for multi-prompt logprob calculation

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* fixing codeQL issues

* Apply isort and black reformatting

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>

* removed min_length definition in previous commit but forgot to remove its use

* update comments and arguments in deploy/nlp/query_llm.py

* Apply isort and black reformatting

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

* delete unused arguments from test_pytriton_deploy.py

* remove some debug prints from megatronllm_deployable

* rename test file due to pytest issue

Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>

---------

Signed-off-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Onur Yilmaz <oyilmaz@nvidia.com>
Co-authored-by: Justin Kim <jukim@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: oyilmaz-nvidia <oyilmaz-nvidia@users.noreply.github.com>
Co-authored-by: jukim-nv <jukim-nv@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
@ko3n1g ko3n1g mentioned this pull request Jul 18, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants