Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

optimize the pod postfix naming style to make its length shorter #721

Closed
nicklhy opened this issue Feb 26, 2020 · 5 comments
Closed

optimize the pod postfix naming style to make its length shorter #721

nicklhy opened this issue Feb 26, 2020 · 5 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature.

Comments

@nicklhy
Copy link

nicklhy commented Feb 26, 2020

/kind feature

What happened:
I just upgrade volcano from v0.2 to the latest master branch (by executing kubectl apply -f https://raw.githubusercontent.com/volcano-sh/volcano/master/installer/volcano-development.yaml ). However, I soon noticed some of my previous model training yaml files can not be launched successfully now. By executing kubectl describe jobs.batch.volcano.sh -n mdt ${JobName}, I can see that the error is caused by the size overflow of some pods' names:

 Warning  FailedCreate  7s               vc-controller-manager  Error creating pods: [failed to create pod volcano-mpi-softmax-job-mpiworker-1, err: &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.Ty
peMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"Pod \"volcano-mpi-softmax-job-mpiworker-1\"
 is invalid: [spec.volumes[3].name: Invalid value: \"volcano-mpi-softmax-job-0cdbfd01-7a0e-45e6-bf17-a580fa3a0a46-ssh\": must be no more than 63 characters, spec.containers[0].volumeMounts[3].name: Not fo
und: \"volcano-mpi-softmax-job-0cdbfd01-7a0e-45e6-bf17-a580fa3a0a46-ssh\"]", Reason:"Invalid", Details:(*v1.StatusDetails)(0xc0008009c0), Code:422}} failed to create pod volcano-mpi-softmax-job-mpiworker-
0, err: &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersion:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"
Failure", Message:"Pod \"volcano-mpi-softmax-job-mpiworker-0\" is invalid: [spec.volumes[3].name: Invalid value: \"volcano-mpi-softmax-job-0cdbfd01-7a0e-45e6-bf17-a580fa3a0a46-ssh\": must be no more than 
63 characters, spec.containers[0].volumeMounts[3].name: Not found: \"volcano-mpi-softmax-job-0cdbfd01-7a0e-45e6-bf17-a580fa3a0a46-ssh\"]", Reason:"Invalid", Details:(*v1.StatusDetails)(0xc000875020), Code
:422}} failed to create pod volcano-mpi-softmax-job-mpimaster-0, err: &errors.StatusError{ErrStatus:v1.Status{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ListMeta:v1.ListMeta{SelfLink:"", ResourceVersio
n:"", Continue:"", RemainingItemCount:(*int64)(nil)}, Status:"Failure", Message:"Pod \"volcano-mpi-softmax-job-mpimaster-0\" is invalid: [spec.volumes[0].name: Invalid value: \"volcano-mpi-softmax-job-0cd
bfd01-7a0e-45e6-bf17-a580fa3a0a46-ssh\": must be no more than 63 characters, spec.containers[0].volumeMounts[0].name: Not found: \"volcano-mpi-softmax-job-0cdbfd01-7a0e-45e6-bf17-a580fa3a0a46-ssh\"]", Rea
son:"Invalid", Details:(*v1.StatusDetails)(0xc000704ba0), Code:422}}]

The job name in my yaml file is "volcano-mpi-softmax-job" (which makes no trouble in Volcano V0.2), but the pod name "volcano-mpi-softmax-job-0cdbfd01-7a0e-45e6-bf17-a580fa3a0a46-ssh" breaks the 63 characters size limit of k8s pod name.

Sure I can solve this problem by making my job name "volcano-mpi-softmax-job" to a shorter one. However, I guess it would be much better if we can squeeze the postfix part "-0cdbfd01-7a0e-45e6-bf17-a580fa3a0a46-ssh" and free up some space for the user defined job name?

How to reproduce it (as minimally and precisely as possible):

Just change the job name in "mpi-example.yaml" from "lm-mpi-job" to "lm-mpi-job-very-long-version" and launch it.

Anything else we need to know?:

Environment:

  • Volcano Version: the latest master branch
  • Kubernetes version (use kubectl version): v1.15.2
  • OS (e.g. from /etc/os-release): Ubuntu 16.04.2
@volcano-sh-bot volcano-sh-bot added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 26, 2020
@hzxuzhonghu
Copy link
Collaborator

/kind bug

@volcano-sh-bot volcano-sh-bot added the kind/bug Categorizes issue or PR as related to a bug. label Feb 26, 2020
@hzxuzhonghu
Copy link
Collaborator

volcano-mpi-softmax-job-0cdbfd01-7a0e-45e6-bf17-a580fa3a0a46-ssh

I donot remember gen so many random strs, could you please provide your job yaml?

@hzxuzhonghu
Copy link
Collaborator

IC, you used the ssh plugin

func (sp *sshPlugin) secretName(job *batch.Job) string {
	return fmt.Sprintf("%s-%s-%s", job.Name, job.UID, sp.Name())
}

Whatever, we should validate this

@k82cn
Copy link
Member

k82cn commented Feb 26, 2020

IC, you used the ssh plugin

func (sp *sshPlugin) secretName(job *batch.Job) string {
	return fmt.Sprintf("%s-%s-%s", job.Name, job.UID, sp.Name())
}

Whatever, we should validate this

We should find another way to generate its name :)

@k82cn
Copy link
Member

k82cn commented Mar 4, 2020

fixed by #726

@k82cn k82cn closed this as completed Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants