Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record injected node affinity in batch Job #518

Closed
3 tasks
kerthcet opened this issue Jan 16, 2023 · 39 comments · Fixed by #660 or #861
Closed
3 tasks

Record injected node affinity in batch Job #518

kerthcet opened this issue Jan 16, 2023 · 39 comments · Fixed by #660 or #861
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@kerthcet
Copy link
Contributor

What would you like to be added:

When we want to suspend Job, we'd like to restore the original nodeAffinity, but some times we can't find the derived workload, see

if len(workloads.Items) == 1 {
// The job may have been modified and hence the existing workload
// doesn't match the job anymore. All bets are off if there are more
// than one workload...
w = &workloads.Items[0]
}

I'd like to add the nodeAffinity to Job annotations to make this an accurate one. It would like:

annotations: kueue.sig.kubernetes.io/injected-node-affinity: '{"key1": "value1", "key2": "value2"}'

Why is this needed:

Always make sure that when suspending a Job, we'll restore the original nodeAffinity.

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@kerthcet kerthcet added the kind/feature Categorizes issue or PR as related to a new feature. label Jan 16, 2023
@kerthcet
Copy link
Contributor Author

cc @ahg-g as the first author.

@ahg-g
Copy link
Contributor

ahg-g commented Jan 16, 2023

My concern is that this will add another update request for every job. Is this cost justified?

@alculquicondor
Copy link
Contributor

alculquicondor commented Feb 1, 2023

It can be the same API call that updates the Job spec.

However, maybe we should store the original node selector instead of the injected one.

@ahg-g
Copy link
Contributor

ahg-g commented Feb 28, 2023

Note that when the job is suspended, the controller will reset the nodeSelector on the job:

if w != nil && !equality.Semantic.DeepEqual(job.Spec.Template.Spec.NodeSelector,

@kerthcet
Copy link
Contributor Author

kerthcet commented Feb 28, 2023

Note that when the job is suspended, the controller will reset the nodeSelector on the job:

Yes, but sometimes the corner case could be job is unsuspended, but the workload is deleted in an unknown condition, like delete in manual, then the job will maintain a wrongly configured nodeSelector.

@mcariatm
Copy link
Contributor

mcariatm commented Mar 6, 2023

/assign

@ahg-g
Copy link
Contributor

ahg-g commented Mar 6, 2023

Note that when the job is suspended, the controller will reset the nodeSelector on the job:

Yes, but sometimes the corner case could be job is unsuspended, but the workload is deleted in an unknown condition, like delete in manual, then the job will maintain a wrongly configured nodeSelector.

Right, it is ok to record the original nodeSelector as long as it is done in the same nodeSelector update request, but I think it is worth having a discussion on whether we want to attach the workload's life with the job using finalizers.

@alculquicondor
Copy link
Contributor

Deleting a Workload seems like an important tool for forcing a requeue and adding finalizers could further complicate this use case.

@kerthcet
Copy link
Contributor Author

kerthcet commented Mar 9, 2023

whether we want to attach the workload's life with the job using finalizers

Add a finalizer to workload, when to delete the workload, restore the node selector with Job. Seems more convincible. I prefer to not use annotation if we can. But yes, we will have to handle the terminating workload in job reconciling.

@alculquicondor
Copy link
Contributor

it's /assign

@alculquicondor
Copy link
Contributor

@mimowo could you take on a review for a future PR on this, in the context of the job integration framework?

@trasc
Copy link
Contributor

trasc commented Mar 22, 2023

/assign

@mcariatm mcariatm removed their assignment Mar 22, 2023
@mimowo
Copy link
Contributor

mimowo commented Mar 22, 2023

@mimowo could you take on a review for a future PR on this, in the context of the job integration framework?

Sure.

@alculquicondor
Copy link
Contributor

I'm now wondering whether we should revert this, as we have a growing number of annotations to support partial admission.

On support of this feature, we have resiliency: we can loose the Workload object and we can still recover the job.

However, is it worth?

cc @tenzen-y @trasc

I think we can just document that users (including admins) shouldn't remove a Workload object.

@alculquicondor
Copy link
Contributor

/reopen

@k8s-ci-robot k8s-ci-robot reopened this May 31, 2023
@k8s-ci-robot
Copy link
Contributor

@alculquicondor: Reopened this issue.

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@kerthcet
Copy link
Contributor Author

kerthcet commented Jun 1, 2023

I think we can just document that users (including admins) shouldn't remove a Workload object

Or leveraging finalizers here, when deleting jobs, we'll restore the node affinity and then remove the finalizer.

@alculquicondor
Copy link
Contributor

The finalizers alternative require careful thinking too. We don't want to accidentally leave objects with finalizers. The problem here is that the Job object is a parent of the Workload object. As such, the Job can't be deleted unless the Workload is deleted first. Then we have a circular dependency.

Another alternative is that a Workload has a finalizer if it's admitted. But the complication is that finalizers are not part of the status, so we need additional API calls.

Not sure if it's worth the effort, but worth exploring.

@kerthcet
Copy link
Contributor Author

kerthcet commented Jun 1, 2023

RE: It's not delete the job but delete the workload. The problem we can't restore the node affinity is because workload might be deleted accidentally, now we add the finalized to the workload, when we want to delete the workload, we'll restore the Job, then remove the finalizer, the workload will be deleted finally.

@tenzen-y
Copy link
Member

tenzen-y commented Jun 1, 2023

I'm now wondering whether we should revert this, as we have a growing number of annotations to support partial admission.

Yea... Our annotations are so big...

It's not delete the job but delete the workload. The problem we can't restore the node affinity is because workload might be deleted accidentally, now we add the finalized to the workload, when we want to delete the workload, we'll restore the Job, then remove the finalizer, the workload will be deleted finally.

That probably works fine. But we need to evaluate whether it is worth doing even if making more API calls as @alculquicondor says.

@trasc
Copy link
Contributor

trasc commented Jun 6, 2023

An alternative could be to have an additional custom resource just to back-up selectors and counts. This resource should be owned by the job, so after we are creating / update it before unsuspending the job, we do not need to keep track of it's lifecycle.

We will have additional API calls, but they should not trigger any controller.

@alculquicondor @kerthcet @tenzen-y WDYT?

@alculquicondor
Copy link
Contributor

Another CRD would have the same issue about needing a finalizer.

The workload is already an object that end-users shouldn't have permissions to edit or delete.

I prefer we get rid of the annotation, without any finalizer in the Workload. And re-evaluate in the future if we find a use case where end-users need to modify the Workload object.

@trasc
Copy link
Contributor

trasc commented Jun 6, 2023

Another CRD would have the same issue about needing a finalizer.

Not exactly, there is no delete conditioning , when the job gets deleted so it's the "backup" resource.

@alculquicondor
Copy link
Contributor

The Job also owns the Workload. So when we delete the Job, the Workload gets deleted as well.

The problem is what happens if someone (not Kueue) deletes the Workload prematurely.

@trasc
Copy link
Contributor

trasc commented Jun 6, 2023

Yes the problem is when the workload gets deleted before the restore, in that case the backup resource will still exists, and the restore can be done from that.

When the job gets deleted ... we don't actually care what is happening to the selectors and counts.

@alculquicondor
Copy link
Contributor

alculquicondor commented Jun 6, 2023

But what's the difference between the "backup" resource and Workload? They are both subject to an unauthorized deletion. I don't see any difference, so I rather have one object.

@tenzen-y
Copy link
Member

tenzen-y commented Jun 6, 2023

The key here is that end-users accidentally remove resources storing original job information.
So, I think the problem isn't solved even if we introduce another CRD.

@trasc
Copy link
Contributor

trasc commented Jun 6, 2023

I don't see accidental removal as a real problem, the chances of it to happen is the same as accidental removal of the job, however with a different resource a queue administrator could make sure that "end-users accidentally remove resources storing original job information" by RBAC.

During the review of the original implementation, I think, workload deletions was presented as a valid way to requeue.

@alculquicondor
Copy link
Contributor

alculquicondor commented Jun 6, 2023

Right... that's an easy way of evicting a workload and put it in the front of the queue.

The alternative would be that the administrator only deletes the admission field in the status, but this puts the job in the back of the queue.

A finalizer would still be a more perfomant solution than having a second object. The question would be which controller removes the finalizer?
I guess it could be the workload controller itself, based on whether the Admitted=true condition is present.
It has to be each job controller, based on whether the original spec was restored after the Workload has a deletionTimestamp.

@alculquicondor
Copy link
Contributor

alculquicondor commented Jun 6, 2023

I would suggest we mark the deletion of a Workload as "unsupported behavior" to start.

@tenzen-y @kerthcet are you ok with that?

@tenzen-y
Copy link
Member

tenzen-y commented Jun 6, 2023

I would suggest we mark the deletion of a Workload as "unsupported behavior" to start.

Does that mean we say the note in the document?

@alculquicondor
Copy link
Contributor

yes

@tenzen-y
Copy link
Member

tenzen-y commented Jun 6, 2023

Agree.

Additionally, we might want to consider another way to re-enqueue the job.

@kerthcet
Copy link
Contributor Author

kerthcet commented Jun 7, 2023

SGTM, the ROI is high 😄

@mimowo
Copy link
Contributor

mimowo commented Jun 7, 2023

Does it make sense to consider a knob in Kueue configuration whether to store the annotation?

Some users wouldn't be concerned about Job size (reasons may vary: 1. using non-indexed jobs, 2. using small node selectors, or using indexed jobs with small parameters), but may be concerned about losing track of node selectors.

@kerthcet
Copy link
Contributor Author

kerthcet commented Jun 7, 2023

I think the motivation here is avoid to use too many annotations from the POV of kueue. If we have the ambition to push this to the upstream, it will be a stumbling stock. 🥲

@trasc
Copy link
Contributor

trasc commented Jun 7, 2023

I've opened a new PR for this #834, It's still in draft since is developed on top of #771.

More interesting for this discussion are:

  • commit no 1 which removes the use the annotations, but we should not delete workloads anymore
  • commit no 2 adds and manages a finalizer in the workloads (this is still wip , need some cleanup and tests, but is functional)

Please have a look and let me know what you think.

@trasc
Copy link
Contributor

trasc commented Jun 7, 2023

/assign

@alculquicondor
Copy link
Contributor

alculquicondor commented Jun 7, 2023

Does it make sense to consider a knob in Kueue configuration whether to store the annotation?

I prefer we don't maintain such piece of code. We are also risking that the annotation changes name/contents from one version to the next, as we do more mutations during admission.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
8 participants