Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Krkn telemetry integration #435

Merged
merged 23 commits into from
Aug 10, 2023
Merged

Conversation

tsebastiani
Copy link
Collaborator

Adding telemetry data collection to krkn.

@tsebastiani
Copy link
Collaborator Author

/funtest

config/config.yaml Outdated Show resolved Hide resolved
run_kraken.py Show resolved Hide resolved
run_kraken.py Outdated Show resolved Hide resolved
run_kraken.py Outdated Show resolved Hide resolved
@chaitanyaenr
Copy link
Collaborator

Testing update: successfully ran hog scenarios with telemetry enabled. Here is the output stored in S3:

{
    "scenarios": [
        {
            "startTimeStamp": 1689037675,
            "endTimeStamp": 1689037762.1072097,
            "scenario": "scenarios/arcaflow/cpu-hog/input.yaml",
            "exitStatus": 0,
            "parametersBase64": "",
            "parameters": {
                "input_list": [
                    {
                        "cpu_count": 1,
                        "cpu_load_percentage": 80,
                        "cpu_method": "all",
                        "duration": "30s",
                        "kubeconfig": "anonymized",
                        "namespace": "default",
                        "node_selector": {}
                    }
                ]
            }
        },
        {
            "startTimeStamp": 1689037762,
            "endTimeStamp": 1689037797.4187903,
            "scenario": "scenarios/arcaflow/memory-hog/input.yaml",
            "exitStatus": 0,
            "parametersBase64": "",
            "parameters": {
                "input_list": [
                    {
                        "duration": "30s",
                        "kubeconfig": "anonymized",
                        "namespace": "default",
                        "node_selector": {},
                        "vm_bytes": "10%",
                        "vm_workers": 2
                    }
                ]
            }
        }
    ]
}

Nice addition. Thinking about a couple of enhancements:

  • Use uuid generated during the run as name of the document as an identifier
  • Alerts triggered pre and post chaos ( separate entries )
  • Thinking if OCP metadata can be part of the same document to make it easy to tie things together. Thoughts?

@chaitanyaenr
Copy link
Collaborator

chaitanyaenr commented Jul 27, 2023

Testing update:

Warnings around capturing builds

2023-07-27 19:27:00,903 [WARNING] CustomObjectsApi -> (404)
Reason: Not Found
{},"status":"Failure","message":"builds.config.openshift.io \"status\" not found","reason":"NotFound","details":{"name":"status","group":"config.openshift.io","kind":"builds"},"code":404}

Maybe we should print the details about the bucket where the data is stored at the end of the run for the users with access to be able to find it. Thoughts?

run_kraken.py Outdated Show resolved Hide resolved
@tsebastiani tsebastiani force-pushed the telemetry branch 3 times, most recently from 85bc42f to af8f0df Compare August 2, 2023 10:23
@chaitanyaenr
Copy link
Collaborator

@tsebastiani let's rebase this PR.

@tsebastiani
Copy link
Collaborator Author

/funtest

1 similar comment
@tsebastiani
Copy link
Collaborator Author

/funtest

chaitanyaenr added a commit to chaitanyaenr/kraken-hub that referenced this pull request Aug 10, 2023
This enables shipping telemtry data ( chaos + OCP metadata ) and prometheus
dump to a centralized location: krkn-chaos/krkn#435.
@tsebastiani
Copy link
Collaborator Author

/funtest

1 similar comment
@tsebastiani
Copy link
Collaborator Author

/funtest

@tsebastiani
Copy link
Collaborator Author

/funtest

Copy link
Collaborator

@chaitanyaenr chaitanyaenr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@chaitanyaenr chaitanyaenr merged commit 39c0152 into krkn-chaos:main Aug 10, 2023
2 checks passed
@openshift-ci openshift-ci bot added the lgtm label Aug 10, 2023
@openshift-ci
Copy link

openshift-ci bot commented Aug 10, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: chaitanyaenr, tsebastiani

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tsebastiani tsebastiani deleted the telemetry branch August 11, 2023 07:30
chaitanyaenr added a commit to krkn-chaos/krkn-hub that referenced this pull request Aug 11, 2023
This enables shipping telemtry data ( chaos + OCP metadata ) and prometheus
dump to a centralized location: krkn-chaos/krkn#435.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants