-
Notifications
You must be signed in to change notification settings - Fork 245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-168-2: Pending workloads visibility #1300
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
Hi @PBundyra. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/assign @tenzen-y |
|
||
### Goals | ||
|
||
- Support listing in order all pending workloads in a ClusterQueue, no matter the size of the queue, and without delay, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the implementation consider the ability to support paged queries? I think this is still pretty important, especially with a lot of workloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the K8s Aggregation Layer does not support pagination out of the box, we don't want to commit to it since it would require significant effort. However, it can be implemented on the client side. We also expose the endpoint to fetch information about a single workload, so there is a way to query its position without listing all the remaining pending workloads.
I've updated the non-goals and API Details sections to cover this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clarify what we mean by pagination.
Let's say I want to list 1000 elements now, and I can only see 100 at a time. Do you keep the 1000 elements somewhere in memory, with some key id, and allow users to query that list in 100 increments? In other words, you get a consistent view if you use the same id?
I don't think we need that. Simply listing elements from position X to Y, at this time, is enough.
Can we do something simple like having 2 query parameters for the API:
- first position
- number of elements (and this is some sane value by default: probably around 1000?)
Doing it in the client side (let's say a dashboard) could be wasteful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know at the moment how to have extra query parameters, so went for the simple proposal without pagination at all. In the prototype we implement it here:
kueue/pkg/visibility/api/pending_workload_CQ.go
Lines 80 to 91 in 5bbb00f
func (m *pendingWorkloadsInCQ) Get(ctx context.Context, name string, opts *metav1.GetOptions) (runtime.Object, error) { | |
var wls []v1alpha1.PendingWorkloadSummary | |
for _, val := range m.kueueMgr.GetPendingWorkloadsInfo(name) { | |
wls = append(wls, v1alpha1.PendingWorkloadSummary{ | |
ObjectMeta: metav1.ObjectMeta{ | |
Name: val.Name, | |
Namespace: val.Namespace, | |
}, | |
}) | |
} | |
return &v1alpha1.PendingWorkloadSummaryList{Items: wls}, nil | |
} |
http.Request
to see the query parameters. It might be possible, but it is not obvious how to get them. AFAIK metrics server also does not do pagination, so we don't have a good example of passing the query parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something simple like having 2 query parameters for the API:
first position
number of elements (and this is some sane value by default: probably around 1000?)
Doing it in the client side (let's say a dashboard) could be wasteful.
I also think it's better to support on the server side or not support. On the client side is not a good choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cross linking: The Pod log subresource implements query parameters https://github.com/kubernetes/kubernetes/blob/029452198566a41bc39d04a1ec5bad3f37621a1c/pkg/registry/core/pod/rest/log.go#L78
It's a different interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, implementing a different interface allows us to pass query parameters. Hence, following @alculquicondor proposal, we will enable users to fetch pending workloads from position X to Y.
7c5e70e
to
4171fe6
Compare
|
||
## Proposal | ||
|
||
- Add new API, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The last point does not seem important enough to mention in this section.
Also, I would suggest to formulate to capture the main technical aspects of the proposal, without going too much detail like in the "Design Details" section (or something along the lines):
"Add a new Extension API server to expose on-demand endpoints for fetching information about pending workloads."
Then, it may also be worth adding something like:
"The returned information about pending workloads includes all the necessary information relevant for their position in the queue, along with the position itself. There are three such endpoints: (1) to list the pending workloads in ClusterQueue, (2) list the pending workloads in LocalQueue, and (3) get a specific pending workload."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Left some nits, but it is IMO good enough to:
/assign @alculquicondor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
### Goals | ||
|
||
- Support listing in order all pending workloads in a ClusterQueue, no matter the size of the queue, and without delay, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's clarify what we mean by pagination.
Let's say I want to list 1000 elements now, and I can only see 100 at a time. Do you keep the 1000 elements somewhere in memory, with some key id, and allow users to query that list in 100 increments? In other words, you get a consistent view if you use the same id?
I don't think we need that. Simply listing elements from position X to Y, at this time, is enough.
Can we do something simple like having 2 query parameters for the API:
- first position
- number of elements (and this is some sane value by default: probably around 1000?)
Doing it in the client side (let's say a dashboard) could be wasteful.
|
||
We introduce a new API that will extend the existing one. | ||
|
||
There will be separate endpoints for administrators and regular users. Each endpoint exposes information about a pending workload, such as: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that this mean that there will be 2 endpoints for clusterqueue, 2 for localqueue, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC there is only one endpoint for ClusterQueue, and one for LocalQueue. These translate in an implied way into endpoints for admins and users, but this may depend on RBAC. We may rephrase this sentence to clarify.
For example: "There will be separate endpoints exposing the information about pending workloads for LocalQueues, and ClusterQueues". Or something along the lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified
#### List all pending workloads in ClusterQueue | ||
|
||
``` | ||
GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would something like this be possible?
GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME | |
GET /apis/visibility.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME/pending_workloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We tried like this, but there seems to be a limitation of the framework, that the path can only have one fixed fragment. The fragment is the key in this map in the prototype:
kueue/pkg/visibility/api/install.go
Lines 45 to 47 in 5bbb00f
visibilityServerResources := map[string]rest.Storage{ | |
"clusterqueues": pendingWorkloadsInCQ, | |
} |
Then, depending on the value returned here:
return false |
However, I don't think we can have the path you suggest here within the framework.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that /
is treated specially.
We do this for pods https://github.com/kubernetes/kubernetes/blob/029452198566a41bc39d04a1ec5bad3f37621a1c/pkg/registry/core/rest/storage_core.go#L231
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out @alculquicondor. Indeed, it's possible to introduce that kind of subresource so I'll change the KEP
|
||
#### Fetch information about a single Workload | ||
``` | ||
GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/namespaces/WL_NAMESPACE/workloads/WL_NAME |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the workload is admitted or finished? would you see some information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at the moment. However, for admitted or finished workloads users don't need extra information that is dynamically changing, such as position in the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, let's clarify in the KEP.
Maybe this is an argument for keeping the name of the API group as pending-workloads.kueue.x-k8s.io
.
Let's go with that and we can iterate in the future as we see the need for new "visibility APIs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've clarified that this endpoint refers to a single pending workload
Maybe, I can review this KEP in the next week. |
I came back here today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
I'll leave lgtm to @mimowo
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, PBundyra The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also please squash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than #1300 (comment), LGTM.
8b7ca36
to
49a6991
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Just some nits
Update keps/168-2-pending-workloads-visibility/README.md Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Update keps/168-2-pending-workloads-visibility/README.md Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com> Update keps/168-2-pending-workloads-visibility/kep.yaml Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
/lgtm |
LGTM label has been added. Git tree hash: 2d1ff78485b8093d033a843de53e2267bc3a5ade
|
What type of PR is this?
/kind feature
What this PR does / why we need it:
Introduces a new API to expose information about the position of pending workloads in both ClusterQueue and LocalQueue.
Which issue(s) this PR fixes:
Part of #168
Special notes for your reviewer:
Does this PR introduce a user-facing change?