(add) deepspeed_mpi specific container, deepspeed_config for MPI with nodetaints #549

ghost · 2023-04-13T09:39:59Z

This MR introduces an integration example of DeepSpeed, a distributed training library, with Kubeflow to the main mpi-operator examples. The objective of this example is to enhance the efficiency and performance of distributed training jobs by harnessing the combined capabilities of DeepSpeed and MPI. Comments in configuration explains the use of taints and tolerations in the Kubernetes configuration to ensure the proper scheduling of DeepSpeed worker pods on nodes with specific resources, such as GPUs.

… nodetaints

google-cla · 2023-04-13T09:40:03Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

google-oss-prow · 2023-04-13T09:40:08Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign alculquicondor for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

alculquicondor · 2023-04-13T13:52:49Z

examples/v2beta1/deepspeed/deepspeed-config.yaml

+              - -bind-to
+              - none
+              - -map-by
+              - slot
+              - -x
+              - NCCL_DEBUG=INFO
+              - -x
+              - LD_LIBRARY_PATH
+              - -x
+              - PATH
+              - -mca
+              - pml
+              - ob1
+              - -mca
+              - btl
+              - ^openib


are all of these necessary?

Not all strictly necessary however these options commonly used in both MPI workloads and MPI operator examples.

Do you think we need to remove these flags?

tbh, I left them as legacy from the very first examples I found for tensorflow and horovod, as I didn't know much about them.
But our basic MPI sample has almost no parameters https://github.com/kubeflow/mpi-operator/blob/master/examples/v2beta1/pi/pi.yaml

If you know enough to leave the bare basics, that would be better.

alculquicondor · 2023-04-13T13:54:38Z

examples/v2beta1/deepspeed/Dockerfile

@@ -0,0 +1,31 @@
+# Official PyTorch image with CUDA support
+FROM pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime


Have you tried using mpioperator/base instead?

Haven't tried yet, it could be better to use this mpioperator/base image and just installing CUDA dependencies for DS with additional PyTorch / Tensorflow configurations.

Actually, probably better to use mpioperator/openmpi. If you can make it work, that'd be great, as proof that the base images can be extended. I couldn't get tensorflow to work.

Will try to make it work for both and patch to the PR 👍

tenzen-y · 2023-04-14T19:44:14Z

examples/v2beta1/deepspeed/deepspeed-config.yaml

+          containers:
+          # Container with the DeepSpeed training image built from the provided Dockerfile and the DeepSpeed support
+          # Change your image name and version in here
+          - image: <YOUR-DEEPSPEED-CONTAINER-NAME>:<VERSION>


This is just saying.
We can provide a sample image once #541 is completed.

Syulin7 · 2023-04-24T07:50:59Z

DeepSpeed configures multi-node compute resources with hostfiles that are compatible with OpenMPI. A hostfile is a list of hostnames (or SSH aliases), which are machines accessible via passwordless SSH.

Do we need to support Deepspeed's own parallel launcher (via pdsh) in mpi-operator? The difference is that the default path for the hostfile in Deepspeed is /job/hostfile. Therefore, if the operator can generate /job/hostfile (like horovod discover_hosts.sh), it can support Deepspeed's own parallel launcher.

Ref: https://www.deepspeed.ai/getting-started/#resource-configuration-multi-node

@dtunai @alculquicondor @tenzen-y WDYT?

tenzen-y · 2023-04-24T12:19:14Z

DeepSpeed configures multi-node compute resources with hostfiles that are compatible with OpenMPI. A hostfile is a list of hostnames (or SSH aliases), which are machines accessible via passwordless SSH.

Do we need to support Deepspeed's own parallel launcher (via pdsh) in mpi-operator? The difference is that the default path for the hostfile in Deepspeed is /job/hostfile. Therefore, if the operator can generate /job/hostfile (like horovod discover_hosts.sh), it can support Deepspeed's own parallel launcher.

Through the above document, I don't think we need to generate the hostile in /job/hostfile since the users can set the hostile path via the deepspeed command, and the deepspeed uses the same format with OpenMPI for the hostile.

IIRC, we generate discover_hosts.sh for the horovod since the horovod uses a different format than OpenMPI for the discovery.sh.

tenzen-y · 2023-04-24T12:21:43Z

Let me know if I'm missing any other.

Syulin7 · 2023-04-24T12:42:16Z

Your understanding is correct. Currently, DeepSpeed supports the following three forms:

like this pr, launched with mpirun.
mpirun python train.py --deepspeed_mpi
launched with the "deepspeed" command, which will read the /job/hostfile file by default and via pdsh.
deepspeed train.py
launched with the "deepspeed" command, setting --hostfile=/etc/mpi/hostfile
deepspeed --hostfile=/etc/mpi/hostfile train.py

Therefore, if we need to support the second and third forms in mpi-operator, perhaps we can remind users in the document that they must set --hostfile=/etc/mpi/hostfile?

I would like to add a new example to do this.

tenzen-y · 2023-04-24T13:10:28Z

Your understanding is correct. Currently, DeepSpeed supports the following three forms:

like this pr, launched with mpirun.
mpirun python train.py --deepspeed_mpi

launched with the "deepspeed" command, which will read the /job/hostfile file by default and via pdsh.
deepspeed train.py

launched with the "deepspeed" command, setting --hostfile=/etc/mpi/hostfile
deepspeed --hostfile=/etc/mpi/hostfile train.py

Therefore, if we need to support the second and third forms in mpi-operator, perhaps we can remind users in the document that they must set --hostfile=/etc/mpi/hostfile?

I would like to add a new example to do this.

Thank you for clarifying.
Probably, it is enough to add an example for the first (this PR) and third forms.

@alculquicondor wdyt?

alculquicondor · 2023-04-24T17:37:57Z

Is there a way to specify the hostfile via environment variable? That's how we do it for mpirun.

Are there any changes required to have pdsh work?
Or maybe some features can be disabled, such as the secret that contains ssh keys?

Syulin7 · 2023-04-25T09:43:23Z

Is there a way to specify the hostfile via environment variable? That's how we do it for mpirun.

I couldn't find an environment variable to specify the hostfile(like 'OMPI_MCA_orte_default_hostfile') in the Deepspeed document.

Are there any changes required to have pdsh work?
Or maybe some features can be disabled, such as the secret that contains ssh keys?

The secret that contains SSH keys is necessary, and PDSH also accesses workers via passwordless SSH.
Based on my testing, the hostfile and passwordless SSH are sufficient for pdsh to work.

alculquicondor · 2023-05-26T17:54:53Z

I guess we can move forward with this PR and provide another example using deepspeed --hostfile.
@dogukanutuna did you have a chance to make this work with mpioperator/base?

tenzen-y · 2023-06-13T14:34:26Z

@simulark Why did you close this PR?

ghost · 2023-06-13T14:45:15Z

@simulark Why did you close this PR?

Hello @tenzen-y.
It was an unintended action, if you cannot restore it right now, I can bring a new PR with the mpioperator/base.

tenzen-y · 2023-06-13T14:46:19Z

@simulark Why did you close this PR?

Hello @tenzen-y. It was an unintended action, if you cannot restore it right now, I can bring a new PR with the mpioperator/base.

Oh, I see. Thank you for letting me know!

(add) deepspeed_mpi specific container, deepspeed_config for MPI with…

7026ace

… nodetaints

google-oss-prow bot requested review from tenzen-y and zw0610 April 13, 2023 09:40

google-oss-prow bot added the size/L label Apr 13, 2023

alculquicondor reviewed Apr 13, 2023

View reviewed changes

tenzen-y reviewed Apr 14, 2023

View reviewed changes

tenzen-y mentioned this pull request Apr 24, 2023

When will large model frameworks be supported. deepspeed for example kubeflow/training-operator#1792

Open

alculquicondor mentioned this pull request May 26, 2023

questions about applying for nodes and gpus #558

Open

ghost closed this by deleting the head repository Jun 13, 2023

ghost mentioned this pull request Jun 13, 2023

(integration) deepspeed_mpi specific container, deepspeed_config for MPI with nodetaints #567

Closed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(add) deepspeed_mpi specific container, deepspeed_config for MPI with nodetaints #549

(add) deepspeed_mpi specific container, deepspeed_config for MPI with nodetaints #549

ghost commented Apr 13, 2023

google-cla bot commented Apr 13, 2023

google-oss-prow bot commented Apr 13, 2023

alculquicondor Apr 13, 2023

ghost Apr 13, 2023 •

edited by ghost

Loading

alculquicondor Apr 13, 2023

alculquicondor Apr 13, 2023

alculquicondor Apr 13, 2023

ghost Apr 13, 2023

alculquicondor Apr 13, 2023

ghost Apr 13, 2023

tenzen-y Apr 14, 2023

Syulin7 commented Apr 24, 2023

tenzen-y commented Apr 24, 2023

tenzen-y commented Apr 24, 2023

Syulin7 commented Apr 24, 2023 •

edited

Loading

tenzen-y commented Apr 24, 2023

alculquicondor commented Apr 24, 2023

Syulin7 commented Apr 25, 2023

alculquicondor commented May 26, 2023

tenzen-y commented Jun 13, 2023

ghost commented Jun 13, 2023 •

edited by ghost

Loading

tenzen-y commented Jun 13, 2023

		@@ -0,0 +1,31 @@
		# Official PyTorch image with CUDA support
		FROM pytorch/pytorch:1.9.0-cuda11.1-cudnn8-runtime

(add) deepspeed_mpi specific container, deepspeed_config for MPI with nodetaints #549

(add) deepspeed_mpi specific container, deepspeed_config for MPI with nodetaints #549

Conversation

ghost commented Apr 13, 2023

google-cla bot commented Apr 13, 2023

google-oss-prow bot commented Apr 13, 2023

Choose a reason for hiding this comment

ghost Apr 13, 2023 • edited by ghost Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Syulin7 commented Apr 24, 2023

tenzen-y commented Apr 24, 2023

tenzen-y commented Apr 24, 2023

Syulin7 commented Apr 24, 2023 • edited Loading

tenzen-y commented Apr 24, 2023

alculquicondor commented Apr 24, 2023

Syulin7 commented Apr 25, 2023

alculquicondor commented May 26, 2023

tenzen-y commented Jun 13, 2023

ghost commented Jun 13, 2023 • edited by ghost Loading

tenzen-y commented Jun 13, 2023

ghost Apr 13, 2023 •

edited by ghost

Loading

Syulin7 commented Apr 24, 2023 •

edited

Loading

ghost commented Jun 13, 2023 •

edited by ghost

Loading