Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PROF-6558] Show profiler overhead in flamegraph for CPU Profiling 2.0 #2607

Merged
merged 9 commits into from
Feb 14, 2023

Conversation

ivoanjo
Copy link
Member

@ivoanjo ivoanjo commented Feb 8, 2023

What does this PR do?:

This PR changes the cpu-time/wall-time sampling in the CPU Profiling 2.0 profiler to expose the profiler overhead as part of the captured data.

Specifically, during sampling, the profiler now measures how much cpu-time/wall-time is spent by the profiler itself, and adds that to the profiler as a separate stack (that shows up as belonging to dd-trace-rb).

Motivation:

In the old Ruby profiler, the profiler always worked from a background thread. Because we also sampled that background thread, the flamegraph included the overhead of the profiler -- it was the overhead of the background thread.

With CPU Profiling 2.0, sampling is done in whatever thread is holding the Global VM Lock, which means that this part of the profiler overhead was not being exposed. With this change, this impact on code is now accounted for.

Additional Notes:

I was torn between a few options, and decided to use the stack of the CpuAndWallTimeWorker thread to represent the overhead of the profiler.

Thus, when this stack shows up in another thread, it represents the overhead of the profiler on that thread.

I could have gone with a placeholder (e.g. just a frame that says "Profiler Sampling" or something similar), but such placeholders break the "Only My Code" feature.

Using a stack from a different thread than the one we're supposedly "sampling" adds a bit of complexity; but we can always undo this and go with the placeholder in this becomes problematic.

How to test the change?:

Change includes code coverage.

I got tired of the extra boilerplate from using hashes and introduced
two small structs to contain the parsed profile data.
This will enable it to be reused when we want to add the profiler
overhead as a stack.
**What does this PR do?**:

This PR changes the cpu-time/wall-time sampling in the
CPU Profiling 2.0 profiler to expose the profiler overhead as part
of the captured data.

Specifically, during sampling, the profiler now measures how much
cpu-time/wall-time is spent by the profiler itself, and adds that
to the profiler as a separate stack (that shows up as belonging to
dd-trace-rb).

**Motivation**:

In the old Ruby profiler, the profiler always worked from a background
thread. Because we also sampled that background thread, the flamegraph
included the overhead of the profiler -- it was the overhead of the
background thread.

With CPU Profiling 2.0, sampling is done in whatever thread is holding
the Global VM Lock, which means that this part of the profiler overhead
was not being exposed. With this change, this impact on code is now
accounted for.

**Additional Notes**:

I was torn between a few options, and decided to use the stack
of the `CpuAndWallTimeWorker` thread to represent the overhead
of the profiler.

Thus, when this stack shows up in another thread, it represents
the overhead of the profiler on that thread.

I could have gone with a placeholder (e.g. just a frame that says
"Profiler Sampling" or something similar), but such placeholders
break the "Only My Code" feature.

Using a stack from a different thread than the one we're
supposedly "sampling" adds a bit of complexity; but we can always
undo this and go with the placeholder in this becomes problematic.

**How to test the change?**:

Change includes code coverage.
Ooops! I missed updating this spec since it doesn't run on all Rubies.
Because the new feature relies on adding extra samples to the
flamegraph, the

```ruby
expect(stats.fetch(:signal_handler_enqueued_sample)).to be >= sample_count
```

assertion was failing.
This will be useful when we want to process the profile and
either focus on or disregard these samples.
The profiler overhead frames show up non-deterministically (because
they show up in whatever thread has the Global VM Lock), so they can
cause a lot of the specs to become flaky.

Excluding them should be enough to avoid the issue.
@ivoanjo ivoanjo marked this pull request as ready for review February 8, 2023 17:16
@ivoanjo ivoanjo requested a review from a team February 8, 2023 17:16
@github-actions github-actions bot added the profiling Involves Datadog profiling label Feb 8, 2023
@marcotc
Copy link
Member

marcotc commented Feb 13, 2023

decided to use the stack of the CpuAndWallTimeWorker thread to represent the overhead of the profiler.

This seems ok. I agree that there's no straightforward lightweight solution here.
Any precise solution I can think of involves adding an extra dimension to all samples.
I think the chosen solution is a good compromise.

@ivoanjo
Copy link
Member Author

ivoanjo commented Feb 14, 2023

Thanks Marco for the review! :)

Merging away!

@ivoanjo ivoanjo merged commit cf7cadf into master Feb 14, 2023
@ivoanjo ivoanjo deleted the ivoanjo/prof-6558-expose-profiler-overhead-v2 branch February 14, 2023 11:31
@github-actions github-actions bot added this to the 1.10.0 milestone Feb 14, 2023
@TonyCTHsu TonyCTHsu mentioned this pull request Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
profiling Involves Datadog profiling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants