Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add resource limits into ResourceGroup of ClusterQueue/Cohort #3215

Open
3 tasks
FillZpp opened this issue Oct 12, 2024 · 5 comments
Open
3 tasks

Add resource limits into ResourceGroup of ClusterQueue/Cohort #3215

FillZpp opened this issue Oct 12, 2024 · 5 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@FillZpp
Copy link

FillZpp commented Oct 12, 2024

What would you like to be added:

Maybe a limits field that can be added into ResourceGroup struct for ClusterQueue or the new Cohort CRD.

Like this:

apiVersion: kueue.x-k8s.io/v1beta1
kind: ClusterQueue
spec:
  cohort: "foo"
  resourceGroups:
  - coveredResources: ["cpu", "memory", "nvidia.com/gpu"]
    limits:
      # maybe also limits to cpu/memory
      nvidia.com/gpu: 16
    flavors:
    - name: "group-0"
      resources:
      - name: "cpu"
        nominalQuota: 64
      - name: "memory"
        nominalQuota: 128Gi
      - name: "nvidia.com/gpu"
        nominalQuota: 8
    - name: "group-1"
      resources:
      - name: "cpu"
        nominalQuota: 32
      - name: "memory"
        nominalQuota: 64Gi
      - name: "nvidia.com/gpu"
        nominalQuota: 4

Why is this needed:

What we want is that even we have several flavors in a ResourceGroup which all have different nominalQuota/borrowingLimit and can borrow from another ClusterQueue with the same cohort, but we still need an overall limit for this ResourceGroup including all flavors used in it.

Does it make sense?

Completion requirements:

This enhancement requires the following artifacts:

  • Design doc
  • API change
  • Docs update

The artifacts should be linked in subsequent comments.

@FillZpp FillZpp added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 12, 2024
@KunWuLuan
Copy link
Contributor

Hi, @FillZpp , is borrowingLimit enough for the use cases?

@FillZpp
Copy link
Author

FillZpp commented Oct 14, 2024

Hi, @FillZpp , is borrowingLimit enough for the use cases?

Unfortunately, no. Since I have multiple flavors with different borrowingLimit, I still need an overall resource limit of this ClusterQueue, which is probably less than the sumary of all flavors' nomialQuota+borrowingLimit.

@KunWuLuan
Copy link
Contributor

KunWuLuan commented Oct 14, 2024

Are limits larger than the sum of all flavors' nomialQuota?

@mimowo
Copy link
Contributor

mimowo commented Oct 14, 2024

I suppose I see the use case, but in the example resourceGroups.limits.gpu=16 is more than the total capacity of the flavors 12. And there is no cohort, so IIUC the limit wouldn't be reached anyway. So, I'm not sure if the example is representative of your use case, is it?

@FillZpp
Copy link
Author

FillZpp commented Oct 14, 2024

@mimowo Oh, it does have a cohort and can borrow from another clusterqueue. It's just I simplified the yaml. Sorry for the confusion...

@KunWuLuan Not necessarily. Let's say maybe it has a total limit, but I can't clearly split its percentage for each flavors from the start, even for the nomialQuota. So that when some of the previous flavor's nomialQuota has been borrowed by another CQ, it can use more of the next flavor as possible, instead of reclaiming the borrowed resource.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants