-
Notifications
You must be signed in to change notification settings - Fork 375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure which response codes are considered errors for Net/HTTP #2501
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋 @caramcc , Thanks for the contribution and great work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty straightforward; looking good! Just a few small suggestions for accessibility, quality.
@@ -33,6 +33,8 @@ class Settings < Contrib::Configuration::Settings | |||
o.lazy | |||
end | |||
|
|||
option :response_code_errors, default: 400...599 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking maybe this should be configurable by env var, too? Not all users can practically configure using code in their deployments. We should have some prior art for parsing error code ranges in configuration we could borrow from.
@marcotc thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and added this as well so it could be reviewed, happy to revert that change if it would be better not to add it
@@ -101,6 +101,17 @@ | |||
expect(span.get_tag('error.message')).to be nil | |||
end | |||
|
|||
context 'with custom response error codes' do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it'd be good to see some positive and bad input examples as well (e.g. nil
, in range, etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added additional test cases for in-range errors and the default case, but I'm a bit stuck on testing bad input (e.g. explicitly setting the value to nil
or another wrong type).
Is the convention for this project to do type-checking or type-casting in the settings, in the application code where the setting is invoked, something that the configuration module is expected to handle, or is it the end-user's responsibility to use the correct types for configuration?
Happy to make any changes as appropriate — if you're aware of prior art around checking settings like that, could you share an example or two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
something that the configuration module is expected to handle
For our global configuration settings, those that appear in the Datadog.configure do |c| c.this_option_here end
, we have testing covering different variations of configured inputs, for example.
I noticed that configuration inside contrib/
doesn't have this strict testing today, because most settings under contrib/
influence a few key behaviours that are tested in the integration tests of the instrumented gem at hand.
For the case of response_code_errors
, because it's such a good option that we probably want to also implement it for all other HTTP client gems that we support, I suggest we move the tests you have here, inside context 'with custom response error codes' do
, into its own group of shared_examples
, and then include that shared_example here.
This way, these tests can be used by all other HTTP client gems when someone goes to implement them.
And because the tests will be shared, it's a good idea to be very through with them, as each test in that group will be executed multiple times, across different contrib/
gems.
I suggest adding tests for:
- "with a status code lesser than the default range" (399)
- "with a status code at the beginning of the default range" (400)
- "with a status code at end beginning of the default range" (599)
- "with a status code greater than the default range" (600)
- "with an empty array"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions about the suggested changes here — apologies if some are a bit obvious, I'm an infrastructure engineer and don't have a lot of experience with some of these rspec patterns.
-
Which file specifically should I be adding the
shared_examples
definition to? Most of thesettings_spec.rb
files that I looked at seem to be validating that configuration options get set, and nothing referencing spans, so I'm guessing that isn't right. -
I noticed some
shared_examples
defined in the other HTTP library specs (e.g. HttpClient, HttpRb) but afaict both are just referenced in the same file (withit_behaves_like 'instrumented request'
) — is that what I should be doing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Which file specifically should I be adding the
shared_examples
definition to? Most of thesettings_spec.rb
files that I looked at seem to be validating that configuration options get set, and nothing referencing spans, so I'm guessing that isn't right.
For HTTP status testing, because it will be shared with other contrib/
HTTP clients, I suggest adding it to a new file:
spec/datadog/tracing/contrib/http_examples.rb
. For reference, this is related to this comment from earlier:
I suggest adding tests for:
"with a status code lesser than the default range" (399)
"with a status code at the beginning of the default range" (400)
"with a status code at end beginning of the default range" (599)
"with a status code greater than the default range" (600)
"with an empty array"
- I noticed some
shared_examples
defined in the other HTTP library specs (e.g. HttpClient, HttpRb) but afaict both are just referenced in the same file (withit_behaves_like 'instrumented request'
) — is that what I should be doing here?
You will be creating a shared_examples
in spec/datadog/tracing/contrib/http_examples.rb
and referencing it in spec/datadog/tracing/contrib/http/request_spec.rb
with a include_examples
. You'll have to require http_examples.rb
for it to be visible to request_spec.rb
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a POC snippet:
RSpec.shared_examples 'with error status code configuration' do
context 'with a custom range' do
context "with an Range object" do
let(:configuration_options) { { response_code_errors: 500..502 } }
context 'with a status code within the range' do
let(:status_code) { 399 }
it 'marks the span as an error' do
expect(span).to have_error
end
end
context 'with a status code outside of the range' do
let(:status_code) { 503 }
it 'does not mark the span as an error' do
expect(span).to_not have_error
end
end
end
context "with an Array object" do
let(:status_code) { 500 }
context "with an empty array" do
let(:configuration_options) { { response_code_errors: [] } }
it 'does not mark the span as an error' do
expect(span).to_not have_error
end
end
context 'with a status code in the array' do
let(:configuration_options) { { response_code_errors: [400,500] } }
let(:status_code) { 400 }
it 'marks the span as an error' do
expect(span).to have_error
end
end
context 'with a status code not in the array' do
let(:configuration_options) { { response_code_errors: [400,500] } }
let(:status_code) { 399 }
it 'does not mark the span as an error' do
expect(span).to_not have_error
end
end
end
end
context 'with the default range' do
context "with a status code lesser than the range" do
let(:status_code) { 399 }
it 'does not mark the span as an error' do
expect(span).to_not have_error
end
end
context "with a status code at the beginning of the range" do
let(:status_code) { 400 }
it 'marks the span as an error' do
expect(span).to have_error
end
end
context "with a status code at the end of the range" do
let(:status_code) { 599 }
it 'marks the span as an error' do
expect(span).to have_error
end
end
context "with a status code greater than the range" do
let(:status_code) { 600 }
it 'does not mark the span as an error' do
expect(span).to_not have_error
end
end
end
end
Notice these shared_examples
depend on the test that includes it to use the value for configuration_options
, status_code
, and span
.
You tests already use configuration_options
and span
, so that part should just work, but you'll have to use status_code
to configure your test to return the status code in the status_code
local method, like so: before { stub_request(...).to_return(status: status_code, ...) }
. This will, hopefully, make tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was incredibly helpful guidance, thank you very much! I've updated the specs to use this pattern (using your POC for the http_examples
) — they're all passing locally for me
docs/GettingStarted.md
Outdated
@@ -1367,6 +1367,7 @@ content = Net::HTTP.get(URI('http://127.0.0.1/index.html')) | |||
| `distributed_tracing` | Enables [distributed tracing](#distributed-tracing) | `true` | | |||
| `service_name` | Service name used for `http` instrumentation | `'net/http'` | | |||
| `split_by_domain` | Uses the request domain as the service name when set to `true`. | `false` | | |||
| `response_code_errors` | Range or Array of HTTP status codes that should be traced as errors. | `400...599` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest changing this to error_status_codes
, given "status code" here can only refer to the response (the HTTP request does not have a status code):
| `response_code_errors` | Range or Array of HTTP status codes that should be traced as errors. | `400...599` | | |
| `error_status_codes` | Range or Array of HTTP status codes that should be traced as errors. | `400...599` | |
@@ -33,6 +33,11 @@ class Settings < Contrib::Configuration::Settings | |||
o.lazy | |||
end | |||
|
|||
option :response_code_errors do |o| | |||
o.default { env_to_list(Ext::ENV_RESPONSE_CODE_ERRORS, 400...599, comma_separated_only: false)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(400...599).include?(599)
# => false
The endless range does not include 599
. I suggest changing the end value to 600, keeping the range endless:
o.default { env_to_list(Ext::ENV_RESPONSE_CODE_ERRORS, 400...599, comma_separated_only: false)} | |
o.default { env_to_list(Ext::ENV_RESPONSE_CODE_ERRORS, 400...600, comma_separated_only: false)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense 👍
FWIW This default value of 400...599
was taken from the current behavior in order to keep this change backwards-compatible:
dd-trace-rb/lib/datadog/tracing/contrib/http/instrumentation.rb
Lines 98 to 101 in e5bb0eb
case response.code.to_i | |
when 400...599 | |
span.set_error(response) | |
end |
Do you think it's worth making a backwards-incompatible change to update this range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the exclusion of 599 is likely an off-by-one bug, it should be fixed.
Bug fixes are allowed to perform behaviour changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @caramcc! It all looks great! ✅
We'll address the CI failures, which are unrelated to your changes.
Awesome, thank you! I should have a PR to add this feature to |
👋 @caramcc, one last thing: would you fix the Rubocop issues in the PR? https://app.circleci.com/pipelines/github/DataDog/dd-trace-rb/8567/workflows/26c184ea-bbd2-440e-aa2b-3078caece159/jobs/318635 It seems like they are all auto-correctable ( |
Codecov Report
@@ Coverage Diff @@
## master #2501 +/- ##
==========================================
- Coverage 98.03% 98.03% -0.01%
==========================================
Files 1126 1127 +1
Lines 60814 60860 +46
Branches 2773 2772 -1
==========================================
+ Hits 59619 59664 +45
- Misses 1195 1196 +1
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
What does this PR do?
Adds the ability to configure which HTTP response codes are considered errors for the Net/HTTP module
Motivation
The applications I'm instrumenting make a lot of API calls to external services using Net/HTTP. Since the services are user-configured, we expect a relatively large volume of these requests to fail with 4xx or 5xx status codes. As a result, we end up ingesting a lot of traces due to
ingestion_reason:error
that we don't consider to be an error and don't otherwise care about.Adding the ability to configure which HTTP status codes are considered errors will allow us to reduce our ingestion rates without entirely ignoring all Net/HTTP spans that return a 4xx or 5xx status code.
(See my feature request for this issue for more context.)
How to test the change?
I tested the change with
bundle exec appraisal ruby-3.0.4-contrib rake spec:http
from the Docker container described in the development guide. All specs, including the new one that I've added in this PR, passed locally.