KEP-168-2: Pending workloads visibility #1300

PBundyra · 2023-10-30T16:31:45Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Introduces a new API to expose information about the position of pending workloads in both ClusterQueue and LocalQueue.

Which issue(s) this PR fixes:

Part of #168

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

netlify · 2023-10-30T16:31:52Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`bb3df08`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6555ed88d35acb0008c37fd6

k8s-ci-robot · 2023-10-30T16:31:55Z

Hi @PBundyra. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

PBundyra · 2023-10-30T16:32:05Z

/assign @mimowo
/assign @mwielgus

tenzen-y · 2023-10-31T06:34:53Z

/ok-to-test

keps/168-2-pending-workloads-visibility/README.md

PBundyra · 2023-10-31T09:39:16Z

/assign @tenzen-y

denkensk · 2023-10-31T09:54:06Z

keps/168-2-pending-workloads-visibility/README.md

+
+### Goals
+
+- Support listing in order all pending workloads in a ClusterQueue, no matter the size of the queue, and without delay,


Will the implementation consider the ability to support paged queries? I think this is still pretty important, especially with a lot of workloads.

Since the K8s Aggregation Layer does not support pagination out of the box, we don't want to commit to it since it would require significant effort. However, it can be implemented on the client side. We also expose the endpoint to fetch information about a single workload, so there is a way to query its position without listing all the remaining pending workloads.

I've updated the non-goals and API Details sections to cover this.

Let's clarify what we mean by pagination.
Let's say I want to list 1000 elements now, and I can only see 100 at a time. Do you keep the 1000 elements somewhere in memory, with some key id, and allow users to query that list in 100 increments? In other words, you get a consistent view if you use the same id?

I don't think we need that. Simply listing elements from position X to Y, at this time, is enough.

Can we do something simple like having 2 query parameters for the API:

first position

number of elements (and this is some sane value by default: probably around 1000?)

Doing it in the client side (let's say a dashboard) could be wasteful.

We don't know at the moment how to have extra query parameters, so went for the simple proposal without pagination at all. In the prototype we implement it here:

kueue/pkg/visibility/api/pending_workload_CQ.go

Lines 80 to 91 in 5bbb00f

func (m *pendingWorkloadsInCQ) Get(ctx context.Context, name string, opts *metav1.GetOptions) (runtime.Object, error) {

var wls []v1alpha1.PendingWorkloadSummary

for _, val := range m.kueueMgr.GetPendingWorkloadsInfo(name) {

wls = append(wls, v1alpha1.PendingWorkloadSummary{

ObjectMeta: metav1.ObjectMeta{

Name: val.Name,

Namespace: val.Namespace,

},

})

}

return &v1alpha1.PendingWorkloadSummaryList{Items: wls}, nil

}

. However, at this layer we don't already have access to http.Request to see the query parameters. It might be possible, but it is not obvious how to get them. AFAIK metrics server also does not do pagination, so we don't have a good example of passing the query parameters.

Can we do something simple like having 2 query parameters for the API:

first position
number of elements (and this is some sane value by default: probably around 1000?)

Doing it in the client side (let's say a dashboard) could be wasteful.

I also think it's better to support on the server side or not support. On the client side is not a good choice.

Cross linking: The Pod log subresource implements query parameters https://github.com/kubernetes/kubernetes/blob/029452198566a41bc39d04a1ec5bad3f37621a1c/pkg/registry/core/pod/rest/log.go#L78

It's a different interface.

Indeed, implementing a different interface allows us to pass query parameters. Hence, following @alculquicondor proposal, we will enable users to fetch pending workloads from position X to Y.

keps/168-2-pending-workloads-visibility/README.md

mimowo · 2023-10-31T15:07:39Z

keps/168-2-pending-workloads-visibility/README.md

+
+## Proposal
+
+- Add new API,


nit: The last point does not seem important enough to mention in this section.

Also, I would suggest to formulate to capture the main technical aspects of the proposal, without going too much detail like in the "Design Details" section (or something along the lines):

"Add a new Extension API server to expose on-demand endpoints for fetching information about pending workloads."

Then, it may also be worth adding something like:
"The returned information about pending workloads includes all the necessary information relevant for their position in the queue, along with the position itself. There are three such endpoints: (1) to list the pending workloads in ClusterQueue, (2) list the pending workloads in LocalQueue, and (3) get a specific pending workload."

keps/168-2-pending-workloads-visibility/kep.yaml

mimowo

LGTM overall. Left some nits, but it is IMO good enough to:
/assign @alculquicondor

alculquicondor

cc @yaroslava-serdiuk

keps/168-2-pending-workloads-visibility/README.md

alculquicondor · 2023-10-31T16:29:05Z

keps/168-2-pending-workloads-visibility/README.md

+
+### Goals
+
+- Support listing in order all pending workloads in a ClusterQueue, no matter the size of the queue, and without delay,


Let's clarify what we mean by pagination.
Let's say I want to list 1000 elements now, and I can only see 100 at a time. Do you keep the 1000 elements somewhere in memory, with some key id, and allow users to query that list in 100 increments? In other words, you get a consistent view if you use the same id?

I don't think we need that. Simply listing elements from position X to Y, at this time, is enough.

Can we do something simple like having 2 query parameters for the API:

first position

number of elements (and this is some sane value by default: probably around 1000?)

Doing it in the client side (let's say a dashboard) could be wasteful.

alculquicondor · 2023-10-31T16:39:47Z

keps/168-2-pending-workloads-visibility/README.md

+
+We introduce a new API that will extend the existing one.
+
+There will be separate endpoints for administrators and regular users. Each endpoint exposes information about a pending workload, such as:


that this mean that there will be 2 endpoints for clusterqueue, 2 for localqueue, etc?

IIUC there is only one endpoint for ClusterQueue, and one for LocalQueue. These translate in an implied way into endpoints for admins and users, but this may depend on RBAC. We may rephrase this sentence to clarify.
For example: "There will be separate endpoints exposing the information about pending workloads for LocalQueues, and ClusterQueues". Or something along the lines.

keps/168-2-pending-workloads-visibility/README.md

alculquicondor · 2023-10-31T16:47:47Z

keps/168-2-pending-workloads-visibility/README.md

+#### List all pending workloads in ClusterQueue
+
+```
+GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME


would something like this be possible?

Suggested change

GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME

GET /apis/visibility.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME/pending_workloads

We tried like this, but there seems to be a limitation of the framework, that the path can only have one fixed fragment. The fragment is the key in this map in the prototype:

kueue/pkg/visibility/api/install.go

Lines 45 to 47 in 5bbb00f

visibilityServerResources := map[string]rest.Storage{

"clusterqueues": pendingWorkloadsInCQ,

}

.

Then, depending on the value returned here:

kueue/pkg/visibility/api/pending_workload_CQ.go

Line 100 in 5bbb00f

return false

we may yet have the namespace fragment on the path.

However, I don't think we can have the path you suggest here within the framework.

It seems that / is treated specially.
We do this for pods https://github.com/kubernetes/kubernetes/blob/029452198566a41bc39d04a1ec5bad3f37621a1c/pkg/registry/core/rest/storage_core.go#L231

Thanks for pointing that out @alculquicondor. Indeed, it's possible to introduce that kind of subresource so I'll change the KEP

alculquicondor · 2023-10-31T16:49:47Z

keps/168-2-pending-workloads-visibility/README.md

+
+#### Fetch information about a single Workload
+```
+GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/namespaces/WL_NAMESPACE/workloads/WL_NAME


What if the workload is admitted or finished? would you see some information?

Not at the moment. However, for admitted or finished workloads users don't need extra information that is dynamically changing, such as position in the queue.

ok, let's clarify in the KEP.
Maybe this is an argument for keeping the name of the API group as pending-workloads.kueue.x-k8s.io.

Let's go with that and we can iterate in the future as we see the need for new "visibility APIs".

I've clarified that this endpoint refers to a single pending workload

tenzen-y · 2023-10-31T16:54:03Z

Maybe, I can review this KEP in the next week.

keps/168-2-pending-workloads-visibility/README.md

tenzen-y · 2023-11-13T22:42:07Z

I came back here today.

keps/168-2-pending-workloads-visibility/README.md

alculquicondor

/approve
I'll leave lgtm to @mimowo

keps/168-2-pending-workloads-visibility/README.md

k8s-ci-robot · 2023-11-15T17:19:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, PBundyra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [alculquicondor]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

keps/168-2-pending-workloads-visibility/kep.yaml

alculquicondor

Also please squash

tenzen-y

Other than #1300 (comment), LGTM.

keps/168-2-pending-workloads-visibility/README.md

mimowo

LGTM, Just some nits

keps/168-2-pending-workloads-visibility/README.md

Update keps/168-2-pending-workloads-visibility/README.md Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com> Update keps/168-2-pending-workloads-visibility/README.md Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com> Update keps/168-2-pending-workloads-visibility/kep.yaml Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>

mimowo · 2023-11-16T10:33:42Z

/lgtm

k8s-ci-robot · 2023-11-16T10:33:48Z

LGTM label has been added.

Git tree hash: 2d1ff78485b8093d033a843de53e2267bc3a5ade

k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 30, 2023

k8s-ci-robot requested a review from denkensk October 30, 2023 16:31

k8s-ci-robot requested a review from mimowo October 30, 2023 16:31

k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 30, 2023

k8s-ci-robot assigned mimowo and mwielgus Oct 30, 2023

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 31, 2023

mimowo reviewed Oct 31, 2023

View reviewed changes

PBundyra force-pushed the kep-168-2 branch from 351d6f4 to 7068783 Compare October 31, 2023 09:37

k8s-ci-robot assigned tenzen-y Oct 31, 2023

denkensk reviewed Oct 31, 2023

View reviewed changes

PBundyra force-pushed the kep-168-2 branch 4 times, most recently from 7c5e70e to 4171fe6 Compare October 31, 2023 15:05

mimowo reviewed Oct 31, 2023

View reviewed changes

k8s-ci-robot assigned alculquicondor Oct 31, 2023

alculquicondor reviewed Oct 31, 2023

View reviewed changes

yaroslava-serdiuk reviewed Nov 1, 2023

View reviewed changes

keps/168-2-pending-workloads-visibility/README.md Show resolved Hide resolved

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved

mimowo reviewed Nov 9, 2023

View reviewed changes

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved

alculquicondor reviewed Nov 13, 2023

View reviewed changes

tenzen-y reviewed Nov 14, 2023

View reviewed changes

PBundyra commented Nov 14, 2023

View reviewed changes

keps/168-2-pending-workloads-visibility/README.md Show resolved Hide resolved

mimowo reviewed Nov 15, 2023

View reviewed changes

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved

alculquicondor reviewed Nov 15, 2023

View reviewed changes

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2023

alculquicondor reviewed Nov 15, 2023

View reviewed changes

keps/168-2-pending-workloads-visibility/kep.yaml Outdated Show resolved Hide resolved

alculquicondor reviewed Nov 15, 2023

View reviewed changes

tenzen-y reviewed Nov 15, 2023

View reviewed changes

mimowo reviewed Nov 16, 2023

View reviewed changes

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved

PBundyra force-pushed the kep-168-2 branch 2 times, most recently from 8b7ca36 to 49a6991 Compare November 16, 2023 10:05

mimowo reviewed Nov 16, 2023

View reviewed changes

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved

PBundyra force-pushed the kep-168-2 branch from 59c6728 to bb3df08 Compare November 16, 2023 10:23

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2023

k8s-ci-robot merged commit fac37dc into kubernetes-sigs:main Nov 16, 2023
14 checks passed

k8s-ci-robot added this to the v0.6 milestone Nov 16, 2023

This was referenced Dec 5, 2023

Fill the CreationTimestamp field in the visibility.PendingWorkload object #1404

Merged

Add e2e tests that test RBAC to the Visibility API #1417

Merged

PBundyra deleted the kep-168-2 branch January 29, 2024 11:27


		### Goals

		- Support listing in order all pending workloads in a ClusterQueue, no matter the size of the queue, and without delay,

	func (m pendingWorkloadsInCQ) Get(ctx context.Context, name string, opts metav1.GetOptions) (runtime.Object, error) {
	var wls []v1alpha1.PendingWorkloadSummary
	for _, val := range m.kueueMgr.GetPendingWorkloadsInfo(name) {
	wls = append(wls, v1alpha1.PendingWorkloadSummary{
	ObjectMeta: metav1.ObjectMeta{
	Name: val.Name,
	Namespace: val.Namespace,
	},
	})
	}
	return &v1alpha1.PendingWorkloadSummaryList{Items: wls}, nil
	}


		We introduce a new API that will extend the existing one.

		There will be separate endpoints for administrators and regular users. Each endpoint exposes information about a pending workload, such as:

	GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME
	GET /apis/visibility.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME/pending_workloads

	visibilityServerResources := map[string]rest.Storage{
	"clusterqueues": pendingWorkloadsInCQ,
	}

KEP-168-2: Pending workloads visibility #1300

KEP-168-2: Pending workloads visibility #1300

Conversation

PBundyra commented Oct 30, 2023

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

netlify bot commented Oct 30, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

k8s-ci-robot commented Oct 30, 2023

PBundyra commented Oct 30, 2023

tenzen-y commented Oct 31, 2023

PBundyra commented Oct 31, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mimowo Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

mimowo left a comment

Choose a reason for hiding this comment

alculquicondor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mimowo Oct 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tenzen-y commented Oct 31, 2023

tenzen-y commented Nov 13, 2023

alculquicondor left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Nov 15, 2023

alculquicondor left a comment

Choose a reason for hiding this comment

tenzen-y left a comment

Choose a reason for hiding this comment

mimowo left a comment • edited Loading

Choose a reason for hiding this comment

mimowo commented Nov 16, 2023

k8s-ci-robot commented Nov 16, 2023

netlify bot commented Oct 30, 2023 •

edited

Loading

mimowo Oct 31, 2023 •

edited

Loading

mimowo Oct 31, 2023 •

edited

Loading

mimowo left a comment •

edited

Loading