-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve clearing of profile data after Ruby app forks #2362
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Calling `exporter.flush` after fork to clear any existing profiling data works, but it's rather heavy-handed and can generate a few confusing log messages: ``` DEBUG (dd-trace-rb/lib/datadog/profiling/scheduler.rb:73:in `after_fork') Flushing exporter in child process #after_fork and discarding data DEBUG (dd-trace-rb/lib/datadog/profiling/stack_recorder.rb:25:in `serialize') Encoded profile covering 2022-11-10T09:42:22Z to 2022-11-10T09:42:22Z DEBUG (dd-trace-rb/lib/datadog/profiling/exporter.rb:52:in `flush') Skipped exporting profiling events as profile duration is below minimum ``` To improve this, I've cleared a new `Exporter#clear` that is still (currently) implemented using `flush`, but now the exporter knows what's going on more explicitly, and so it can implement `clear` in a better way.
…eeded The previous `flush` call was a placeholder, and we can now improve it.
This will be used by the `Exporter` when needing to clear data without serializing it.
As the `StackRecorder` implements `#clear` natively, no need to simulate it with a call to `#serialize`. I decided not to bother updating the `OldRecorder`, so for that class we'll still use the old behavior.
Codecov Report
@@ Coverage Diff @@
## ivoanjo/prof-5943-handle-vm-forking #2362 +/- ##
====================================================================
Coverage 98.31% 98.32%
====================================================================
Files 1102 1102
Lines 58994 59070 +76
====================================================================
+ Hits 58000 58080 +80
+ Misses 994 990 -4
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Base automatically changed from
ivoanjo/prof-5943-handle-vm-forking
to
master
November 10, 2022 14:20
marcotc
reviewed
Nov 10, 2022
marcotc
reviewed
Nov 10, 2022
marcotc
approved these changes
Nov 14, 2022
ivoanjo
added a commit
that referenced
this pull request
Nov 15, 2022
In the previous commit, we refactored the old profiler codepath so that the `Profiler` would call `#reset_after_fork` on collectors before restarting them in a forking process. In this commit, we use the added hook to propagate the `#reset_after_fork` call to every component, so that they can clean up their internal state in the child process of a fork. In particular: * The chain starts on the `Collectors::CpuAndWallTimeWorker` which must start by disabling its active `TracePoint`s so that something like GC doesn't trigger a sample while we're still resetting the profiler. * Then the `Collectors::CpuAndWallTime` resets: a. Its per-thread tracking information. The native thread ids and CPU-time tracking of the thread that survived the fork need to be reset + all other threads that did not survive the fork need to be cleared. b. The internal stats need to be reset as well * Then the `StackRecorder` resets: a. The active slot and its concurrency control. This is to avoid any issues if the fork happened while a serialization attempt was ongoing and left the concurrency control in an inconsistent state. b. The profiles, so there's no left over data from the parent process in the child process profiles. The `#reset_after_fork` approach actually made the `StackRecorder#clear` method added recently in #2362 actually not be needed anymore, so I went ahead and removed it.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?:
When a Ruby app forks, we automatically restart the profiler in the child process as well. This ensures that in apps such as forking webservers we still provide full profiling coverage.
As part of automatically restarting the profiler in the child process, we want to clear any profiling data that has accumulated from the parent, as otherwise both the parent as well as the child process would report that data.
This "clearing of profiling data" happens in
Scheduler#after_fork
. Previously, this method would "clear data" by asking the exporter for the latest profile and then throwing this away.This worked but it was rather heavy handed -- it did some useless work, as well as generated a few slightly-confusing debug logs:
To improve this for the new Ruby profiler, I've decided to implement a
StackRecorder#clear
, and then refactor theScheduler
andExporter
to use it instead.For the old Ruby profiler (still the default, in-use by customers), I chose not to implement a
#clear
, and thus the old approach of serializing a profile and throwing it away is still used.Motivation:
I've wanted to improve this for some time, but chose not to because I didn't want to invest on improving the old profiler just to then throw it away. Now that the new profiler is getting in better shape, I finally decided to implement this.
Additional Notes:
This PR is on top of #2359 just for my development convenience; it has no explicit dependencies on it.
How to test the change?:
Validate that profiles from the child process of a Ruby app that forks do not contain any profiling data from the parent process by looking at the profiler UX.