Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Job interface to restrain the behaviors of different Job implementations #369

Closed
2 of 3 tasks
kerthcet opened this issue Sep 5, 2022 · 21 comments
Closed
2 of 3 tasks
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@kerthcet
Copy link
Contributor

kerthcet commented Sep 5, 2022

What would you like to be added:

Currently, we have a default integrated implementation of batchv1 job, we achieve the expected behaviors by the experiences of kubernetes, and most importantly, we know the project well.

But for people who wants to implement a special controller in kueue, they have little knowledges and they may don't know where to start. Documents are great and always needed, but they're not binding.

So we should make a contract known as interface in golang, which defines the behaviors of the various jobs. E.g.:

type Job interface {
	Suspend()
	UnSuspend()
	QueueName()
	IsSuspend() bool
	Start() error
	Stop() error
	ConstructWorkload() (*kueue.Workload, error)
}

And we can wrap batch job like below, BatchJob is an implement of kueue.Job

type BatchJob struct {
	job batchv1.Job
}

Why is this needed:

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@kerthcet kerthcet added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 5, 2022
@tenzen-y
Copy link
Member

tenzen-y commented Sep 5, 2022

That sounds great!
The job interface helps support Kueue with custom jobs such as TFJob, PytorchJob, and MPIJob in kubeflow/training-operator and kubeflow/mpi-operator/v2.

@alculquicondor
Copy link
Contributor

I would also add some form of InjectNodeAffinity

@tenzen-y
Copy link
Member

I would also add some form of InjectNodeAffinity

Does this mean the following?

nodeSelector, err := r.getNodeSelectors(ctx, w)
if err != nil {
return err
}
if len(nodeSelector) != 0 {
if job.Spec.Template.Spec.NodeSelector == nil {
job.Spec.Template.Spec.NodeSelector = nodeSelector
} else {
for k, v := range nodeSelector {
job.Spec.Template.Spec.NodeSelector[k] = v
}
}
} else {
log.V(3).Info("no nodeSelectors to inject")
}

@alculquicondor
Copy link
Contributor

yes

@tenzen-y
Copy link
Member

SGTM

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 11, 2022
@tenzen-y
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 11, 2022
@kerthcet
Copy link
Contributor Author

kerthcet commented Jan 9, 2023

We have a plan to integrate with other workloads, also see #499, I'd like to write a design doc at first.
/assign

@tenzen-y
Copy link
Member

tenzen-y commented Jan 9, 2023

I'm also working on the kubeflow project.

So I'm thinking of giving feedback on the design doc since I'd like to support Kueue with training-operator and mpi-operator v2.

@kerthcet
Copy link
Contributor Author

kerthcet commented Jan 9, 2023

Thanks @tenzen-y , when you're ready, plz also cc me, thanks.

@alculquicondor
Copy link
Contributor

From the messages above, I'm not sure who is writing the design doc. It sounds like @kerthcet, is that correct?

@tenzen-y
Copy link
Member

tenzen-y commented Jan 9, 2023

From the messages above, I'm not sure who is writing the design doc. It sounds like @kerthcet, is that correct?

Probably, @kerthcet is writing the doc.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 9, 2023

If @kerthcet doesn't have enough time, I can write it.

@kerthcet
Copy link
Contributor Author

kerthcet commented Jan 9, 2023

Yes, I'm writing now, almost finished.

@tenzen-y
Copy link
Member

tenzen-y commented Jan 9, 2023

Sounds great.

@alculquicondor
Copy link
Contributor

@kerthcet do you have a draft to share? It might be beneficial to provide you feedback already.

@alculquicondor
Copy link
Contributor

Oh, from slack, it seems that Kante is OOO until the end of the month

@tenzen-y
Copy link
Member

Thank you for letting us know.

@kerthcet
Copy link
Contributor Author

FYI #537 KEP is ready for first round of review.

@alculquicondor
Copy link
Contributor

This is completed

/close

@k8s-ci-robot
Copy link
Contributor

@alculquicondor: Closing this issue.

In response to this:

This is completed

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

5 participants