Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement GenericJob interface on batchv1.Job (cherry-picked and resolved conflicts) #616

Merged
merged 2 commits into from
Mar 9, 2023

Conversation

mimowo
Copy link
Contributor

@mimowo mimowo commented Mar 8, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

This is an addition of the GenericJob interface, in preparation for creating a library for frameworks and to commonize code between Job and MPIJob.

Which issue(s) this PR fixes:

Part of #369

Special notes for your reviewer:

This is a fork of ebf2b81 by @kerthcet aligned with the current master.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Mar 8, 2023
@netlify
Copy link

netlify bot commented Mar 8, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 33ff294
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/640992db97ca0400079f08a0

@k8s-ci-robot k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 8, 2023
@mimowo mimowo force-pushed the generic-job-interface branch 2 times, most recently from 790b086 to c5b7344 Compare March 8, 2023 14:20
@alculquicondor
Copy link
Contributor

@kerthcet we are eager to migrate the existing controllers to the library so that we can integrate Ray as well ray-project/kuberay#926. Is it ok with you if we merge this?

@mimowo mimowo force-pushed the generic-job-interface branch 2 times, most recently from 6926605 to 4fecb4e Compare March 8, 2023 14:23
Comment on lines +156 to +155
type BatchJob struct {
batchv1.Job
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type BatchJob struct {
batchv1.Job
}
type Job batchv1.Job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is equivalent. If I do that the extra interface functions don't compile. The idea of introducing BatchJob is so that we can implement GenericJob.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this equivalent? https://go.dev/play/p/N0o1sEDyJVF

Copy link
Contributor Author

@mimowo mimowo Mar 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, the example works indeed. However, there seems to be some subtle differences.
One thing is that when doing type BatchJob struct {batchv1.Job } the BatchJob has the Job field, so the code didn't compile for me at some places. Also, the BatchJob.Object() method now doesn't compile, but I guess I can make fixes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a type rename, I think this should work, so we don't need more variables.

var job *Job
k8sClient.Get(ctx, job)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type definition will not inherit the methods. In k8sClient.Get(ctx, job) the job should be a Kubernetes Object. I considered type alias, but seems not the right scenario VS package migration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
up to you if you want to proceed with this or switch to type Job batch.Job

For now, I prefer the current approach, because it reflects the conceptual distinction between the Job object and the wrapper for the framework purposes. Also, I think we will have time to return to this on the follow up PR(s).

/hold cancel

pkg/controller/workload/job/job_controller.go Outdated Show resolved Hide resolved
pkg/controller/workload/job/job_controller.go Outdated Show resolved Hide resolved
pkg/controller/workload/job/interface.go Outdated Show resolved Hide resolved
pkg/controller/workload/job/helpers.go Outdated Show resolved Hide resolved
pkg/controller/workload/job/helpers.go Outdated Show resolved Hide resolved
pkg/controller/workload/job/job_controller.go Outdated Show resolved Hide resolved
@mimowo mimowo changed the title Implement GenericJob interface on batchv1.Job (forked and resolved conflicts) Implement GenericJob interface on batchv1.Job (cherry-picked and resolved conflicts) Mar 8, 2023
@kerthcet
Copy link
Contributor

kerthcet commented Mar 8, 2023

@kerthcet we are eager to migrate the existing controllers to the library so that we can integrate Ray as well ray-project/kuberay#926. Is it ok with you if we merge this?

Yes plz go ahead, I got something I need to take care of. I already talked with @mimowo offline. I will be back until tomorrow or longer. Thanks guys.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 8, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Mar 8, 2023
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
up to you if you want to proceed with this or switch to type Job batch.Job

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, mimowo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 8, 2023
@alculquicondor
Copy link
Contributor

/lgtm
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 8, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 8, 2023
return nil
}

func (b *BatchJob) Finished() (metav1.Condition, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One argu about the condition is maybe some other projects do not use v1.Condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I sort of agree, but I think we still have time to return to this decision as we get more info from other frameworks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If necessary, the controller implementations can translate from one object to the other. In general they should look pretty similar.

Comment on lines +156 to +155
type BatchJob struct {
batchv1.Job
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type definition will not inherit the methods. In k8sClient.Get(ctx, job) the job should be a Kubernetes Object. I considered type alias, but seems not the right scenario VS package migration.

}

// If there is no matching workload and the job is running, suspend it.
if match == nil && !job.IsSuspend() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to think about move this out of the function then we can handle the Start & Stop operation in the main reconciler loop. But when the matched workload is nil we don't know the injected nodeSelector, so I raised the issue: #518

We can leave it as it-is and discuss about it in that issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a separate issue, leaving as-is for now.

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 9, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Mar 9, 2023

/hold
I've realized the first commit is't marked as co-authored by @kerthcet , let me try to fix it.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 9, 2023
Original commit by Kante Yin "Implement genericjob interface on batchv1.Job":
kubernetes-sigs@ebf2b81

Cherry-picked and conflicts resolution by Michal Wozniak <mw219725@gmail.com>

Co-authored-by: Kante Yin <kerthcet@gmail.com>
Co-authored-by: Michal Wozniak <mw219725@gmail.com>
# Conflicts:
#	pkg/controller/workload/job/job_controller.go
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2023
@mimowo
Copy link
Contributor Author

mimowo commented Mar 9, 2023

/hold
I've realized the first commit is't marked as co-authored by @kerthcet , let me try to fix it.

Done: https://github.com/kubernetes-sigs/kueue/pull/616/commits.

@kerthcet please renew the LGTM.

@mimowo
Copy link
Contributor Author

mimowo commented Mar 9, 2023

/hold cancel
waiting for LGTM

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 9, 2023
@kerthcet
Copy link
Contributor

kerthcet commented Mar 9, 2023

/lgtm
We need more feedbacks also.

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 9, 2023
@k8s-ci-robot k8s-ci-robot merged commit e01a5a8 into kubernetes-sigs:main Mar 9, 2023
@mimowo mimowo deleted the generic-job-interface branch March 18, 2023 18:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants