Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(add) deepspeed_mpi specific container, deepspeed_config for MPI with nodetaints #549

Closed
wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 13, 2023

This MR introduces an integration example of DeepSpeed, a distributed training library, with Kubeflow to the main mpi-operator examples. The objective of this example is to enhance the efficiency and performance of distributed training jobs by harnessing the combined capabilities of DeepSpeed and MPI. Comments in configuration explains the use of taints and tolerations in the Kubernetes configuration to ensure the proper scheduling of DeepSpeed worker pods on nodes with specific resources, such as GPUs.

@google-cla
Copy link

google-cla bot commented Apr 13, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign alculquicondor for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Comment on lines +24 to +39
- -bind-to
- none
- -map-by
- slot
- -x
- NCCL_DEBUG=INFO
- -x
- LD_LIBRARY_PATH
- -x
- PATH
- -mca
- pml
- ob1
- -mca
- btl
- ^openib
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all of these necessary?

Copy link
Author

@ghost ghost Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all strictly necessary however these options commonly used in both MPI workloads and MPI operator examples.

Do you think we need to remove these flags?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh, I left them as legacy from the very first examples I found for tensorflow and horovod, as I didn't know much about them.
But our basic MPI sample has almost no parameters https://github.com/kubeflow/mpi-operator/blob/master/examples/v2beta1/pi/pi.yaml

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you know enough to leave the bare basics, that would be better.

@@ -0,0 +1,31 @@
# Official PyTorch image with CUDA support
FROM pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried using mpioperator/base instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't tried yet, it could be better to use this mpioperator/base image and just installing CUDA dependencies for DS with additional PyTorch / Tensorflow configurations.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, probably better to use mpioperator/openmpi. If you can make it work, that'd be great, as proof that the base images can be extended. I couldn't get tensorflow to work.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try to make it work for both and patch to the PR 👍

containers:
# Container with the DeepSpeed training image built from the provided Dockerfile and the DeepSpeed support
# Change your image name and version in here
- image: <YOUR-DEEPSPEED-CONTAINER-NAME>:<VERSION>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just saying.
We can provide a sample image once #541 is completed.

@Syulin7
Copy link

Syulin7 commented Apr 24, 2023

DeepSpeed configures multi-node compute resources with hostfiles that are compatible with OpenMPI. A hostfile is a list of hostnames (or SSH aliases), which are machines accessible via passwordless SSH.

Do we need to support Deepspeed's own parallel launcher (via pdsh) in mpi-operator? The difference is that the default path for the hostfile in Deepspeed is /job/hostfile. Therefore, if the operator can generate /job/hostfile (like horovod discover_hosts.sh), it can support Deepspeed's own parallel launcher.

Ref: https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node

@dtunai @alculquicondor @tenzen-y WDYT?

@tenzen-y
Copy link
Member

DeepSpeed configures multi-node compute resources with hostfiles that are compatible with OpenMPI. A hostfile is a list of hostnames (or SSH aliases), which are machines accessible via passwordless SSH.

Do we need to support Deepspeed's own parallel launcher (via pdsh) in mpi-operator? The difference is that the default path for the hostfile in Deepspeed is /job/hostfile. Therefore, if the operator can generate /job/hostfile (like horovod discover_hosts.sh), it can support Deepspeed's own parallel launcher.

Through the above document, I don't think we need to generate the hostile in /job/hostfile since the users can set the hostile path via the deepspeed command, and the deepspeed uses the same format with OpenMPI for the hostile.

IIRC, we generate discover_hosts.sh for the horovod since the horovod uses a different format than OpenMPI for the discovery.sh.

@tenzen-y
Copy link
Member

Let me know if I'm missing any other.

@Syulin7
Copy link

Syulin7 commented Apr 24, 2023

Your understanding is correct. Currently, DeepSpeed supports the following three forms:

  1. like this pr, launched with mpirun.
    mpirun python train.py --deepspeed_mpi

  2. launched with the "deepspeed" command, which will read the /job/hostfile file by default and via pdsh.
    deepspeed train.py

  3. launched with the "deepspeed" command, setting --hostfile=/etc/mpi/hostfile
    deepspeed --hostfile=/etc/mpi/hostfile train.py

Therefore, if we need to support the second and third forms in mpi-operator, perhaps we can remind users in the document that they must set --hostfile=/etc/mpi/hostfile?

I would like to add a new example to do this.

@tenzen-y
Copy link
Member

Your understanding is correct. Currently, DeepSpeed supports the following three forms:

  1. like this pr, launched with mpirun.
    mpirun python train.py --deepspeed_mpi
  2. launched with the "deepspeed" command, which will read the /job/hostfile file by default and via pdsh.
    deepspeed train.py
  3. launched with the "deepspeed" command, setting --hostfile=/etc/mpi/hostfile
    deepspeed --hostfile=/etc/mpi/hostfile train.py

Therefore, if we need to support the second and third forms in mpi-operator, perhaps we can remind users in the document that they must set --hostfile=/etc/mpi/hostfile?

I would like to add a new example to do this.

Thank you for clarifying.
Probably, it is enough to add an example for the first (this PR) and third forms.

@alculquicondor wdyt?

@alculquicondor
Copy link
Collaborator

Is there a way to specify the hostfile via environment variable? That's how we do it for mpirun.

Are there any changes required to have pdsh work?
Or maybe some features can be disabled, such as the secret that contains ssh keys?

@Syulin7
Copy link

Syulin7 commented Apr 25, 2023

Is there a way to specify the hostfile via environment variable? That's how we do it for mpirun.

I couldn't find an environment variable to specify the hostfile(like 'OMPI_MCA_orte_default_hostfile') in the Deepspeed document.

Are there any changes required to have pdsh work?
Or maybe some features can be disabled, such as the secret that contains ssh keys?

The secret that contains SSH keys is necessary, and PDSH also accesses workers via passwordless SSH.
Based on my testing, the hostfile and passwordless SSH are sufficient for pdsh to work.

@alculquicondor
Copy link
Collaborator

I guess we can move forward with this PR and provide another example using deepspeed --hostfile.
@dogukanutuna did you have a chance to make this work with mpioperator/base?

@ghost ghost closed this by deleting the head repository Jun 13, 2023
@tenzen-y
Copy link
Member

@simulark Why did you close this PR?

@ghost
Copy link
Author

ghost commented Jun 13, 2023

@simulark Why did you close this PR?

Hello @tenzen-y.
It was an unintended action, if you cannot restore it right now, I can bring a new PR with the mpioperator/base.

@tenzen-y
Copy link
Member

@simulark Why did you close this PR?

Hello @tenzen-y. It was an unintended action, if you cannot restore it right now, I can bring a new PR with the mpioperator/base.

Oh, I see. Thank you for letting me know!

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants