-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PROF-4198] Gather CPU time in profiler without monkey patching Thread for all supported Ruby versions (2.1 to 3.x) #1740
[PROF-4198] Gather CPU time in profiler without monkey patching Thread for all supported Ruby versions (2.1 to 3.x) #1740
Conversation
…by < 2.6 By relying on the `debase-ruby_core_source` extension, we can get access to the internal VM headers needed to implement this functionality on all legacy Ruby versions we support. Note that this does not yet change the profiler to use `NativeExtension.clock_id_for` on these older Rubies.
This abstracts away the actual operation we want in our code -- getting the cpu time for a given thread, rather than exposing the clock_id and having callers handle it.
We now gather cpu time directly using the native extension, rather than relying on the `cthread.rb` monkey patch. This allows us to remove `cthread.rb` (next commit), as well as to clean up the whole "let's try to detect when there's missing instrumentation" approach we had going on. It also enables us to simpify the tests. All great advantages!
We want to preserve this knowledge and tests, so before deleting `cthread.rb` and its spec, let's move this to the stack sampler.
After we move CPU time to use the native extension, we no longer need the "CPU extension" that monkey patches `Thread` creation.
It's been fully replaced by our usage of the native extension to get this information.
`ffi` was only used by the profiling CPU time extension (`cthread.rb`) which has been removed so let's get rid of it. Thanks `ffi` for your service.
We still have a lot of other Windows-related failures, so let's get back to this once 3.1 is released.
Hopefully we can deprecate Ruby 2.1 soon.
Codecov Report
@@ Coverage Diff @@
## master #1740 +/- ##
==========================================
+ Coverage 98.18% 98.21% +0.03%
==========================================
Files 934 930 -4
Lines 45196 44844 -352
==========================================
- Hits 44374 44042 -332
+ Misses 822 802 -20
Continue to review full report at Codecov.
|
ddtrace.gemspec
Outdated
@@ -47,7 +47,7 @@ Gem::Specification.new do |spec| | |||
end | |||
|
|||
# Used by the profiler | |||
spec.add_dependency 'ffi', '~> 1.0' | |||
spec.add_dependency 'debase-ruby_core_source', '>= 0.10.12' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for users with versions of Ruby not in the list, for example 2.5.6p201? https://github.com/ruby-debug/debase-ruby_core_source/tree/master/lib/debase/ruby_core_source
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It falls back to the closest lower version:
$ ruby -v
ruby 2.5.9p229 (2021-04-05 revision 67939) [x86_64-darwin19]
$ bundle exec rake clean compile
(...)
creating extconf.h
checking for vm_core.h... no
checking for vm_core.h... no
**************************************************************************
No source for ruby-2.5.9-p229 (revision 67939) provided with
debase-ruby_core_source gem. Falling back to ruby-2.5.1-p57.
**************************************************************************
checking for vm_core.h... yes
Currently they have the first version of every Ruby release (all the .0). For some versions, they're missing the latest point release (they have 2.2.9 but not 2.2.10), but these core VM apis seem to rarely shift, and we also test with the latest versions in our CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major concern here is would this dependency in anyway prevent users who use tracing on supported versions, platforms and engines from installing & using tracing, if they were to upgrade ddtrace
and use this dependency?
Sounds like there's some kind of fallback mechanism; is this a sufficient guarantee against the prior question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've replied in context in #1740 (comment) since I was already touching on this subject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, just left a question about debase-ruby_core_source
headers.
Any concerns, @delner ? |
@ivoanjo Sorry, will take a look at this! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caveat being it's hard to wrap my head around all the details here, especially on the JIT-side, much of the changes in Ruby seem fine to me. I think it's a pretty strong sales pitch to drop the Thread
monkey patch for a simpler, more effective C implementation.
If I'm understanding the strategy here; we used to use the Thread
monkey-patch to access the native thread ID, since only a Thread
could access its own native thread ID given the Ruby C API. However, because we're adding headers through JIT or using debase
(somehow?), it makes this once inaccessible API available on the C-side, thus nullifying the need for the monkey patch. Do I have that right?
Only two major concerns I have, although they may already be adequately addressed, are:
ddtrace
users who don't need or want profiling. Does thedebase
extension at all prevent & negatively impact their usage of tracing, CI, or other products present inddtrace
? (Your description makes it sound like this is not the case, but just want to be totally clear on that.)- Security: injecting C headers from
debase
seems like a potential vulnerability. It helps that you say this repo is maintained by a reputable group. But it might not be a bad idea to check with a security expert on this first, just to make sure its kosher.
I think if we can just clarify those two things, this should be good on my end.
ddtrace.gemspec
Outdated
@@ -47,7 +47,7 @@ Gem::Specification.new do |spec| | |||
end | |||
|
|||
# Used by the profiler | |||
spec.add_dependency 'ffi', '~> 1.0' | |||
spec.add_dependency 'debase-ruby_core_source', '>= 0.10.12' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Major concern here is would this dependency in anyway prevent users who use tracing on supported versions, platforms and engines from installing & using tracing, if they were to upgrade ddtrace
and use this dependency?
Sounds like there's some kind of fallback mechanism; is this a sufficient guarantee against the prior question?
@@ -32,7 +34,8 @@ class Stack < Worker # rubocop:disable Metrics/ClassLength | |||
:trace_identifiers_helper, | |||
:ignore_thread, | |||
:max_time_usage_pct, | |||
:thread_api | |||
:thread_api, | |||
:cpu_time_provider |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this abstraction.
Yes. There's quite a bit of boilerplate and handling of Ruby versions and OS's, but in the end it all boils down to building this dd-trace-rb/ext/ddtrace_profiling_native_extension/private_vm_api_access.c Lines 16 to 25 in 3995004
...that given any I did change a bit how the Stack sampler accesses the data to simplify it, but the crux of the change is those 9 lines.
I expect no impact on other products. We have been shipping and compiling an empty native extension (#1584) since 0.52.0 (released early August). So in ~3 months of it being out there, no customers have reported any issues with this, and validating this was my intent when I shipped an empty extension. This was the intended result, because we depend on both Specifically for
Finally, but quite importantly, loading the profiling native extension is carefully guarded and checked: dd-trace-rb/lib/ddtrace/profiling.rb Lines 89 to 126 in 3995004
...to ensure that if the native extension fails to load (or fails its self-check), we gracefully handle this with a simple log message and allow the user code to keep running with no impact.
I alluded to this above, but to clarify, this dependency is a pure-Ruby gem, which only ships a bit of Ruby code, and then the Ruby headers, which unless directly used by another gem (like ddtrace) just lie dormant. So I can't think of any way for this gem, by itself, of ever failing to install on any platform, version, or engine. Of course, this gem is useless if you are on JRuby or TruffleRuby or Rubinius, but still cleanly installs on those. When used by dd-trace-rb, we have full control over when to use it (as explained above), so we can turn it on only for the setups that make sense.
I will follow up on this, thanks for raising it. Worst case, we can fork the package just to make sure we have control of the release, but I hope it won't come to that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linter is failing, but once that's fixed this is good to go on my end.
Thanks for the thorough and detailed explanations! I think this will be a good change.
Linter broke due to a new Rubocop version:
This fix is already in master a7801f8 . I'll merge this in and confirm that the master build is green. |
Our `Thread` monkey patch, including `#update_native_ids` was removed in #1740. This code was left over in `setup.rb` since we had a fallback for `Thread`s without the monkey patch, but effectively never runs.
**What does this PR do?**: Allow the latest `debase-ruby_core_source` version to be used by dd-trace-rb. **Motivation**: As discussed in #1740, we decided to pin the range of versions of `debase-ruby_core_source` we use, so that we can validate each version as it comes out. The changes on 0.10.18 (<https://my.diffend.io/gems/debase-ruby_core_source/0.10.17/0.10.18>) don't impact dd-trace-rb, because they only affect Ruby 3.2-preview3 and we don't use `debase-ruby_core_source` on Ruby >= 2.6. Nevertheless, it's important we keep our version up-to-date as customers may need the latest version of `debase-ruby_core_source` for other gems they use. **Additional Notes**: (None) **How to test the change?**: CI should still be green for all supported Ruby versions.
**What does this PR do?**: Allow using the latest `debase-ruby_core_source` together with dd-trace-rb. **Motivation**: As discussed in #1740, we decided to pin the range of versions of `debase-ruby_core_source` we use, so that we can validate each version as it comes out. The changes on 3.2.0 (<https://my.diffend.io/gems/debase-ruby_core_source/0.10.18/3.2.0>) don't impact dd-trace-rb, because they only affect Ruby 3.2.0 and 2.0.0 and we don't use `debase-ruby_core_source` on Ruby >= 2.6 (nor support Ruby 2.0). Nevertheless, it's important we keep our version up-to-date as customers may need the latest version of `debase-ruby_core_source` for other gems they use. **Additional Notes**: (None) **How to test the change?**: CI should still be green for all supported Ruby versions.
In #1735 I added initial support to gather CPU time information without relying on monkey patching for Ruby versions 2.6+.
As a reminder from that PR:
This PR then finishes the work started in #1735 by extending the profiling native extension to also support legacy Ruby versions (e.g. < 2.6), and thus meaning that we can use the profiling native extension on all supported Ruby versions.
We're still restricted to Linux, but that was also the case with the previous approach.
Because we can now use the profiling native extension for all Rubies, this PR also includes the next step: it wires the Stack collector to use the native extension directly, and then removes all references to our previous
cthread.rb
monkey patch. Hence most of this PR being deletions.Because the
cthread.rb
was the only reason we depended onffi
, I've also gone ahead and removed that dependency.Finally, but still quite importantly, this PR adds a new dependency to dd-trace-rb: the
debase-ruby_core_source
gem.The
debase-ruby_core_source
gem is effectively a just tarball of Ruby VM internal headers for every Ruby version. Because these headers are not installed in regular Ruby deployments, we use this gem to get access to a copy of them, and thus to enable the profiling native extension to reference internal VM structures that are not otherwise exposed (seeNativeExtensionDesign.md
for more details).I claim it's both ok and desirable to add this gem as a dependency because:
debase
, which is the team working at JetBrains on the RubyMine IDE (debase is officially used by RubyMine for Ruby debugging)