Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option flags to define nodeSelector, nodeAffinity and toleration on Knative Service #1924

Merged
merged 27 commits into from
May 3, 2024

Conversation

Shashankft9
Copy link
Member

Description

Adds ability to assign knative services to nodes

Changes

  • adds nodeSelector, nodeAffinity and toleration to podspec
  • adds update functions in podspec_helper.go for each of those fields
  • only supports ORed terms for required clause of node affinity
  • supports previously added nodeselectors, but no removals in node affinity and toleration since there is no clear identifier.

Reference

Fixes #1841

Release Note


@knative-prow knative-prow bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 19, 2024
Copy link

@knative-prow knative-prow bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shashankft9: 0 warnings.

In response to this:

Description

Adds ability to assign knative services to nodes

Changes

  • adds nodeSelector, nodeAffinity and toleration to podspec
  • adds update functions in podspec_helper.go for each of those fields
  • only supports ORed terms for required clause of node affinity
  • supports previously added nodeselectors, but no removals in node affinity and toleration since there is no clear identifier.

Reference

Fixes #1841

Release Note


Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@knative-prow knative-prow bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Mar 19, 2024
@Shashankft9
Copy link
Member Author

@dsimansk hey, going to add some more tests, and maybe refactor few things, early feedback would be appreciated!

@Shashankft9 Shashankft9 changed the title WIP: Assigning nodes WIP: Assigning nodes to knative services Mar 19, 2024
Copy link

codecov bot commented Mar 19, 2024

Codecov Report

Attention: Patch coverage is 75.53957% with 34 lines in your changes are missing coverage. Please review.

Project coverage is 76.82%. Comparing base (cbb6f5c) to head (57eeed4).
Report is 9 commits behind head on main.

Files Patch % Lines
pkg/kn/flags/podspec_helper.go 77.27% 16 Missing and 9 partials ⚠️
pkg/kn/flags/podspec.go 76.00% 3 Missing and 3 partials ⚠️
pkg/util/parsing_helper.go 25.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1924      +/-   ##
==========================================
+ Coverage   74.58%   76.82%   +2.23%     
==========================================
  Files         207      207              
  Lines       15567    12892    -2675     
==========================================
- Hits        11611     9904    -1707     
+ Misses       3167     2187     -980     
- Partials      789      801      +12     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

--user int The user ID to run the container (e.g., 1001).
--volume stringArray Add a volume from a ConfigMap (prefix cm: or config-map:) a Secret (prefix secret: or sc:), an EmptyDir (prefix ed: or emptyDir:) or a PersistentVolumeClaim (prefix pvc: or persistentVolumeClaim). Example: --volume myvolume=cm:myconfigmap, --volume myvolume=secret:mysecret or --volume emptyDir:myvol:size=1Gi,type=Memory. You can use this flag multiple times. To unset a ConfigMap/Secret reference, append "-" to the name, e.g. --volume myvolume-.
--volume stringArray Add a volume from a ConfigMap (prefix cm: or config-map:) a Secret (prefix secret: or sc:), an EmptyDir (prefix ed: or emptyDir:) or a PersistentVolumeClaim (prefix pvc: or persistentVolumeClaim). PersistentVolumeClaim and EmptyDir only works if the feature gate is enabled in knative serving. Example: --volume myvolume=cm:myconfigmap, --volume myvolume=secret:mysecret or --volume emptyDir:myvol:size=1Gi,type=Memory. You can use this flag multiple times. To unset a ConfigMap/Secret reference, append "-" to the name, e.g. --volume myvolume-.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not correct entirely. EmptyDir is enabled. Per: https://github.com/knative/serving/blob/ba3f983855bc33555b8416bd745c773fbdd4079c/config/core/configmaps/features.yaml#L183

I see, the PVC support is still not enabled by default. But I'd still opt changing the docs in this PR and rather create a new targeted one for it.

Comment on lines 88 to 95
# Create a service with node selector (if feature flag is enabled here: https://knative.dev/docs/serving/configuration/feature-flags)
kn service create nodeselectortest --image knativesamples/helloworld --node-selector Disktype="ssd"

# Create a service with toleration (if feature flag is enabled here: https://knative.dev/docs/serving/configuration/feature-flags)
kn service create tolerationtest --image knativesamples/helloworld --toleration Key="node-role.kubernetes.io/master",Effect="NoSchedule",Operator="Equal",Value=""

# Create a service with node affinity (if feature flag is enabled here: https://knative.dev/docs/serving/configuration/feature-flags)
kn service create nodeaffinitytest --image knativesamples/helloworld --node-affinity Type="Required",Key="topology.kubernetes.io/zone",Operator="In",Values="antarctica-east1 antarctica-east2"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferably those docs link should point to exact feature anchor, e.g. https://knative.dev/docs/serving/configuration/feature-flags/#kubernetes-toleration.

@dsimansk
Copy link
Contributor

@Shashankft9 thanks for looking into this feature. I'm still not 100% convinced we should have it though.

Given that every option is hidden behind its own specific flag. There's no good way to determine if the flags are actually usable on the Serving instances, until executed against webhook that might reject it.

I wonder how the error message looks like from Serving's webhook. If it propagates a good hint for users why Ksvc creation failed.

More over looking at the "verbosity" of required input. I doubt the overall usefulness. Subjectively I'd opt for KSVC stored in yaml format for such advanced configuration.

I mean this kind of verbosity:

--node-affinity Type="Preferred",Key="topology.kubernetes.io/zone",Operator="In",Values="antarctica-east1",Weight="1"

/cc @rhuss any thoughts?

flagNames = append(flagNames, "toleration")

flagset.StringSliceVar(&p.NodeAffinity, "node-affinity", []string{},
"Add node affinity to be set - only works if the feature gate is enabled in knative serving. When key, operator, values and weight are defined for a type, they will be appended in nodeSelectorTerms in case of Required clause, "+
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like to use this flag a feature flag must be enabled in Knative Serving configuration.

Pls spelled with starting capital letter Knative Serving in all occurrences.

@Shashankft9
Copy link
Member Author

Shashankft9 commented Mar 19, 2024

@dsimansk thanks for the feedback, I'll work on the pointers. But regarding the usefulness, we currently have one use case that needs something like this in kn cli, I will try to justify below:

So in some of edge cloud deployments and dev kubernetes clusters, we are using knative functions with tekton (on-cluster builds), but in addition to its current cli form, we have also used the function client and built a controller on top of it, that provides a CRD like UX for function creation. Currently on-cluster build has three tekton tasks - git clone, build and deploy, but this poses a problem when someone has to add things like scaling configurations, tls and other serving specific configurations. To do this with func cli, its quite easy, but when doing it through on-cluster builds, it implies that the user has to do the changes in func.yaml and then commit it in git and then run on-cluster builds, which goes quite against the UX that we are wanting to provide through CRD.
So, how do we solve this?
We added another tekton task after deploy which we called as kn-patch, so now we accept scaling configurations in our CRD, and then those configurations we pass in this task, which essentially applies all required changes without the user ever doing any changes in git or even knowing that the function exists somewhere in git.

But then, we came across one scenario in our edge cloud deployments where we had certain nodes in a kubernetes cluster dedicated to specific workloads and had a different mtu packet size set on them which had an affect on the usual service/pod networking for those nodes, essentially meaning that our functions must not schedule on those nodes, so we have to make functions stick to some nodes, hence the need for these mechanisms through kn.

Does that make sense merely from the use-case perspective? I am not sure how others are using kn cli, but this is how we are currently using it. Maybe this is something that can be useful for the func's CRD and Operator story?

I wonder how the error message looks like from Serving's webhook. If it propagates a good hint for users why Ksvc creation failed.

For this, I can check and post here, but I think last when I tried, Serving's webhook gave a clear hint around it

@Shashankft9
Copy link
Member Author

here's what it looks like when i disable node affinity, node selector and toleration from feature flags:

root@faas-cluster-xnts4:~# kn service create nodeaffinitytest --image knativesamples/helloworld --node-affinity Type="Required",Key="topology.kubernetes.io/zone",Operator="In",Values="antarctica-east1 antarctica-east2"
Error: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.affinity
Run 'kn --help' for usage
root@faas-cluster-xnts4:~# kn service create tolerationtest --image knativesamples/helloworld --toleration Key="node-role.kubernetes.io/master",Effect="NoSchedule",Operator="Equal",Value=""
Error: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.tolerations
Run 'kn --help' for usage
root@faas-cluster-xnts4:~# kn service create nodeselectortest --image knativesamples/helloworld --node-selector Disktype="ssd"
Error: admission webhook "validation.webhook.serving.knative.dev" denied the request: validation failed: must not set the field(s): spec.template.spec.nodeSelector
Run 'kn --help' for usage

@dsimansk
Copy link
Contributor

@dsimansk thanks for the feedback, I'll work on the pointers. But regarding the usefulness, we currently have one use case that needs something like this in kn cli, I will try to justify below:

So in some of edge cloud deployments and dev kubernetes clusters, we are using knative functions with tekton (on-cluster builds), but in addition to its current cli form, we have also used the function client and built a controller on top of it, that provides a CRD like UX for function creation. Currently on-cluster build has three tekton tasks - git clone, build and deploy, but this poses a problem when someone has to add things like scaling configurations, tls and other serving specific configurations. To do this with func cli, its quite easy, but when doing it through on-cluster builds, it implies that the user has to do the changes in func.yaml and then commit it in git and then run on-cluster builds, which goes quite against the UX that we are wanting to provide through CRD. So, how do we solve this? We added another tekton task after deploy which we called as kn-patch, so now we accept scaling configurations in our CRD, and then those configurations we pass in this task, which essentially applies all required changes without the user ever doing any changes in git or even knowing that the function exists somewhere in git.

But then, we came across one scenario in our edge cloud deployments where we had certain nodes in a kubernetes cluster dedicated to specific workloads and had a different mtu packet size set on them which had an affect on the usual service/pod networking for those nodes, essentially meaning that our functions must not schedule on those nodes, so we have to make functions stick to some nodes, hence the need for these mechanisms through kn.

Does that make sense merely from the use-case perspective? I am not sure how others are using kn cli, but this is how we are currently using it. Maybe this is something that can be useful for the func's CRD and Operator story?

I wonder how the error message looks like from Serving's webhook. If it propagates a good hint for users why Ksvc creation failed.

For this, I can check and post here, but I think last when I tried, Serving's webhook gave a clear hint around it

Thanks, for the extensive reply to support the usefulness concern. I'm getting a better picture now.

One idea how to address my concern might be introducing "experimental" or "advanced" section in the help message. I.e. to de-clutter current list of kn service flags and split them into sections. To clearly indicate that this flags require additional configuration like adding feature flag to Serving.
I'll take a look how to achieve it in spf13/cobra, I recall there are a few options how to create a subsections. And of course descriptive sub-section names (naming game is always hardests part :)).

@knative-prow knative-prow bot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Apr 22, 2024
@knative-prow-robot knative-prow-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Apr 22, 2024
Signed-off-by: Shashankft9 <shanky.337marchss@gmail.com>
@knative-prow knative-prow bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 23, 2024
Signed-off-by: Shashankft9 <shanky.337marchss@gmail.com>
Signed-off-by: Shashankft9 <shanky.337marchss@gmail.com>
@Shashankft9 Shashankft9 changed the title WIP: Assigning nodes to knative services Assigning nodes to knative services Apr 23, 2024
@knative-prow knative-prow bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 23, 2024
@Shashankft9
Copy link
Member Author

Shashankft9 commented Apr 23, 2024

hello @dsimansk , this is ready for another review - I have reverted the changes done for creating subsections in flags, and added tests.

context: subsection in flag work was reverted because of the pending feature in cobra, more here - https://cloud-native.slack.com/archives/C04LY4SKBQR/p1713175380354749

@dsimansk
Copy link
Contributor

@Shashankft9 could you pls try to rerun ./hack/build.sh -c that should execute gofmt and fix formatting issues.

@@ -159,7 +162,7 @@ func (p *PodSpecFlags) AddFlags(flagset *pflag.FlagSet) []string {
flagset.StringArrayVarP(&p.Volume, "volume", "", []string{},
"Add a volume from a ConfigMap (prefix cm: or config-map:) a Secret (prefix secret: or sc:), "+
"an EmptyDir (prefix ed: or emptyDir:) or a PersistentVolumeClaim (prefix pvc: or persistentVolumeClaim). "+
"Example: --volume myvolume=cm:myconfigmap, --volume myvolume=secret:mysecret or --volume emptyDir:myvol:size=1Gi,type=Memory. "+
"PersistentVolumeClaim only works if the feature gate is enabled here: https://knative.dev/docs/serving/configuration/feature-flags/#kubernetes-persistentvolumeclaim-pvc. Example: --volume myvolume=cm:myconfigmap, --volume myvolume=secret:mysecret or --volume emptyDir:myvol:size=1Gi,type=Memory. "+
Copy link
Contributor

@dsimansk dsimansk Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather refer to "Knative Serving feature flags configuration" without exact URL, as it might not age very well over time.

Comment on lines +481 to +482
//TODO: only supporting ORed terms, also support ANDed expressions in a single term
//TODO: only supporting matchExpressions, also support matchFields
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shashankft9 would you like to capture those in a follow-up and address later?

Comment on lines +476 to +477
if value == "Required" {
nodeAffinityType = "Required"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a small nit. I would probably make it less restrictive and with strings.ToLower(value) == "required". We have that at other places that parse such inputs. But it doesn't need to addressed immediately.

@dsimansk dsimansk added the kind/feature New feature or request label May 3, 2024
Copy link
Contributor

@dsimansk dsimansk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Shashankft9 sorry for the delayed replies. I'm going to proceed with the PR and merge it. But please see my last comments. IMO those can be addressed in future iterations.

Thanks!

/approve
/lgtm

@knative-prow knative-prow bot added the lgtm Indicates that a PR is ready to be merged. label May 3, 2024
Copy link

knative-prow bot commented May 3, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dsimansk, Shashankft9

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@knative-prow knative-prow bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 3, 2024
@knative-prow knative-prow bot merged commit 66ddaf6 into knative:main May 3, 2024
23 checks passed
@dsimansk dsimansk changed the title Assigning nodes to knative services Add option flags to define nodeSelector, nodeAffinity and toleration on Knative Service May 3, 2024
@Shashankft9
Copy link
Member Author

@dsimansk ack, I can work on those improvements in followup PR - thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature New feature or request lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ability to set toleration, affinity and node selector
3 participants