Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[site] Add cluster queue resource metrics #984

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions site/content/en/docs/installation/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,7 @@ data:
healthProbeBindAddress: :8081
metrics:
bindAddress: :8080
# enableClusterQueueResources: true
webhook:
port: 9443
manageJobsWithoutQueueName: true
Expand Down
10 changes: 10 additions & 0 deletions site/content/en/docs/reference/metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,13 @@ Use the following metrics to monitor the status of your ClusterQueues:
| `kueue_admission_wait_time_seconds` | Histogram | The time between a Workload was created until it was admitted. | `cluster_queue`: the name of the ClusterQueue |
| `kueue_admitted_active_workloads` | Gauge | The number of admitted Workloads that are active (unsuspended and not finished) | `cluster_queue`: the name of the ClusterQueue |
| `kueue_cluster_queue_status` | Gauge | Reports the status of the ClusterQueue | `cluster_queue`: The name of the ClusterQueue<br> `status`: Possible values are `pending`, `active` or `terminated`. For a ClusterQueue, the metric only reports a value of 1 for one of the statuses. |

### Optional metrics

The following metrics are available only if `metrics.enableClusterQueueResources` is enabled in the [manager's configuration](/docs/installation/#install-a-custom-configured-released-version).

| Metric name | Type | Description | Labels |
| ----------- | ---- | ----------- | ------ |
| `kueue_cluster_queue_resource_usage` | Gauge | Reports the ClusterQueue's total resource usage |`cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name|
| `kueue_cluster_queue_nominal_quota` | Gauge | Reports the ClusterQueue's resource quota |`cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name|
| `kueue_cluster_queue_borrowing_limit` | Gauge | Reports the ClusterQueue's resource borrowing limit |`cohort`: The cohort in which the queue belongs<br> `cluster_queue`: The name of the ClusterQueue<br> `flavor`: referenced flavor<br> `resource`: The resource name|