Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure which response codes are considered errors for Net/HTTP #2501

Merged
merged 11 commits into from
Jan 20, 2023

Conversation

caramcc
Copy link
Contributor

@caramcc caramcc commented Dec 20, 2022

What does this PR do?
Adds the ability to configure which HTTP response codes are considered errors for the Net/HTTP module

Motivation
The applications I'm instrumenting make a lot of API calls to external services using Net/HTTP. Since the services are user-configured, we expect a relatively large volume of these requests to fail with 4xx or 5xx status codes. As a result, we end up ingesting a lot of traces due to ingestion_reason:error that we don't consider to be an error and don't otherwise care about.

Adding the ability to configure which HTTP status codes are considered errors will allow us to reduce our ingestion rates without entirely ignoring all Net/HTTP spans that return a 4xx or 5xx status code.

(See my feature request for this issue for more context.)

How to test the change?
I tested the change with bundle exec appraisal ruby-3.0.4-contrib rake spec:http from the Docker container described in the development guide. All specs, including the new one that I've added in this PR, passed locally.

@caramcc caramcc requested a review from a team December 20, 2022 20:19
@github-actions github-actions bot added integrations Involves tracing integrations tracing labels Dec 20, 2022
Copy link
Contributor

@TonyCTHsu TonyCTHsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋 @caramcc , Thanks for the contribution and great work!

Copy link
Contributor

@delner delner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty straightforward; looking good! Just a few small suggestions for accessibility, quality.

@@ -33,6 +33,8 @@ class Settings < Contrib::Configuration::Settings
o.lazy
end

option :response_code_errors, default: 400...599
Copy link
Contributor

@delner delner Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking maybe this should be configurable by env var, too? Not all users can practically configure using code in their deployments. We should have some prior art for parsing error code ranges in configuration we could borrow from.

@marcotc thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and added this as well so it could be reviewed, happy to revert that change if it would be better not to add it

@@ -101,6 +101,17 @@
expect(span.get_tag('error.message')).to be nil
end

context 'with custom response error codes' do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be good to see some positive and bad input examples as well (e.g. nil, in range, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added additional test cases for in-range errors and the default case, but I'm a bit stuck on testing bad input (e.g. explicitly setting the value to nil or another wrong type).

Is the convention for this project to do type-checking or type-casting in the settings, in the application code where the setting is invoked, something that the configuration module is expected to handle, or is it the end-user's responsibility to use the correct types for configuration?

Happy to make any changes as appropriate — if you're aware of prior art around checking settings like that, could you share an example or two?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something that the configuration module is expected to handle

For our global configuration settings, those that appear in the Datadog.configure do |c| c.this_option_here end, we have testing covering different variations of configured inputs, for example.

I noticed that configuration inside contrib/ doesn't have this strict testing today, because most settings under contrib/ influence a few key behaviours that are tested in the integration tests of the instrumented gem at hand.

For the case of response_code_errors, because it's such a good option that we probably want to also implement it for all other HTTP client gems that we support, I suggest we move the tests you have here, inside context 'with custom response error codes' do, into its own group of shared_examples, and then include that shared_example here.

This way, these tests can be used by all other HTTP client gems when someone goes to implement them.

And because the tests will be shared, it's a good idea to be very through with them, as each test in that group will be executed multiple times, across different contrib/ gems.

I suggest adding tests for:

  • "with a status code lesser than the default range" (399)
  • "with a status code at the beginning of the default range" (400)
  • "with a status code at end beginning of the default range" (599)
  • "with a status code greater than the default range" (600)
  • "with an empty array"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions about the suggested changes here — apologies if some are a bit obvious, I'm an infrastructure engineer and don't have a lot of experience with some of these rspec patterns.

  • Which file specifically should I be adding the shared_examples definition to? Most of the settings_spec.rb files that I looked at seem to be validating that configuration options get set, and nothing referencing spans, so I'm guessing that isn't right.

  • I noticed some shared_examples defined in the other HTTP library specs (e.g. HttpClient, HttpRb) but afaict both are just referenced in the same file (with it_behaves_like 'instrumented request') — is that what I should be doing here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Which file specifically should I be adding the shared_examples definition to? Most of the settings_spec.rb files that I looked at seem to be validating that configuration options get set, and nothing referencing spans, so I'm guessing that isn't right.

For HTTP status testing, because it will be shared with other contrib/ HTTP clients, I suggest adding it to a new file:
spec/datadog/tracing/contrib/http_examples.rb. For reference, this is related to this comment from earlier:

I suggest adding tests for:
"with a status code lesser than the default range" (399)
"with a status code at the beginning of the default range" (400)
"with a status code at end beginning of the default range" (599)
"with a status code greater than the default range" (600)
"with an empty array"

  • I noticed some shared_examples defined in the other HTTP library specs (e.g. HttpClient, HttpRb) but afaict both are just referenced in the same file (with it_behaves_like 'instrumented request') — is that what I should be doing here?

You will be creating a shared_examples in spec/datadog/tracing/contrib/http_examples.rb and referencing it in spec/datadog/tracing/contrib/http/request_spec.rb with a include_examples. You'll have to require http_examples.rb for it to be visible to request_spec.rb.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a POC snippet:

RSpec.shared_examples 'with error status code configuration' do
  context 'with a custom range' do
    context "with an Range object" do
      let(:configuration_options) { { response_code_errors: 500..502 } }

      context 'with a status code within the range' do
        let(:status_code) { 399 }

        it 'marks the span as an error' do
          expect(span).to have_error
        end
      end

      context 'with a status code outside of the range' do
        let(:status_code) { 503 }

        it 'does not mark the span as an error' do
          expect(span).to_not have_error
        end
      end
    end

    context "with an Array object" do
      let(:status_code) { 500 }

      context "with an empty array" do
        let(:configuration_options) { { response_code_errors: [] } }

        it 'does not mark the span as an error' do
          expect(span).to_not have_error
        end
      end

      context 'with a status code in the array' do
        let(:configuration_options) { { response_code_errors: [400,500] } }
        let(:status_code) { 400 }

        it 'marks the span as an error' do
          expect(span).to have_error
        end
      end

      context 'with a status code not in the array' do
        let(:configuration_options) { { response_code_errors: [400,500] } }
        let(:status_code) { 399 }

        it 'does not mark the span as an error' do
          expect(span).to_not have_error
        end
      end
    end
  end

  context 'with the default range' do
    context "with a status code lesser than the range" do
      let(:status_code) { 399 }

      it 'does not mark the span as an error' do
        expect(span).to_not have_error
      end
    end

    context "with a status code at the beginning of the range" do
      let(:status_code) { 400 }

      it 'marks the span as an error' do
        expect(span).to have_error
      end
    end

    context "with a status code at the end of the range" do
      let(:status_code) { 599 }

      it 'marks the span as an error' do
        expect(span).to have_error
      end
    end

    context "with a status code greater than the range" do
      let(:status_code) { 600 }

      it 'does not mark the span as an error' do
        expect(span).to_not have_error
      end
    end
  end
end

Notice these shared_examples depend on the test that includes it to use the value for configuration_options, status_code, and span.
You tests already use configuration_options and span, so that part should just work, but you'll have to use status_code to configure your test to return the status code in the status_code local method, like so: before { stub_request(...).to_return(status: status_code, ...) }. This will, hopefully, make tests pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was incredibly helpful guidance, thank you very much! I've updated the specs to use this pattern (using your POC for the http_examples) — they're all passing locally for me

@delner delner added the community Was opened by a community member label Dec 21, 2022
@delner delner assigned delner and caramcc and unassigned delner Dec 21, 2022
@@ -1367,6 +1367,7 @@ content = Net::HTTP.get(URI('http://127.0.0.1/index.html'))
| `distributed_tracing` | Enables [distributed tracing](#distributed-tracing) | `true` |
| `service_name` | Service name used for `http` instrumentation | `'net/http'` |
| `split_by_domain` | Uses the request domain as the service name when set to `true`. | `false` |
| `response_code_errors` | Range or Array of HTTP status codes that should be traced as errors. | `400...599` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest changing this to error_status_codes, given "status code" here can only refer to the response (the HTTP request does not have a status code):

Suggested change
| `response_code_errors` | Range or Array of HTTP status codes that should be traced as errors. | `400...599` |
| `error_status_codes` | Range or Array of HTTP status codes that should be traced as errors. | `400...599` |

@@ -33,6 +33,11 @@ class Settings < Contrib::Configuration::Settings
o.lazy
end

option :response_code_errors do |o|
o.default { env_to_list(Ext::ENV_RESPONSE_CODE_ERRORS, 400...599, comma_separated_only: false)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(400...599).include?(599)
# => false

The endless range does not include 599. I suggest changing the end value to 600, keeping the range endless:

Suggested change
o.default { env_to_list(Ext::ENV_RESPONSE_CODE_ERRORS, 400...599, comma_separated_only: false)}
o.default { env_to_list(Ext::ENV_RESPONSE_CODE_ERRORS, 400...600, comma_separated_only: false)}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense 👍

FWIW This default value of 400...599 was taken from the current behavior in order to keep this change backwards-compatible:

case response.code.to_i
when 400...599
span.set_error(response)
end

Do you think it's worth making a backwards-incompatible change to update this range?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the exclusion of 599 is likely an off-by-one bug, it should be fixed.
Bug fixes are allowed to perform behaviour changes.

Copy link
Member

@marcotc marcotc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @caramcc! It all looks great! ✅

We'll address the CI failures, which are unrelated to your changes.

@caramcc
Copy link
Contributor Author

caramcc commented Jan 12, 2023

Awesome, thank you! I should have a PR to add this feature to httpclient and httprb ready to go within the next couple days as well.

@marcotc
Copy link
Member

marcotc commented Jan 12, 2023

👋 @caramcc, one last thing: would you fix the Rubocop issues in the PR? https://app.circleci.com/pipelines/github/DataDog/dd-trace-rb/8567/workflows/26c184ea-bbd2-440e-aa2b-3078caece159/jobs/318635

It seems like they are all auto-correctable (1161 files inspected, 11 offenses detected, 11 offenses autocorrectable), so a bundle exec rake rubocop:autocorrect will fix them all for you.

@codecov-commenter
Copy link

Codecov Report

Merging #2501 (ed44f21) into master (92e4442) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #2501      +/-   ##
==========================================
- Coverage   98.03%   98.03%   -0.01%     
==========================================
  Files        1126     1127       +1     
  Lines       60814    60860      +46     
  Branches     2773     2772       -1     
==========================================
+ Hits        59619    59664      +45     
- Misses       1195     1196       +1     
Impacted Files Coverage Δ
...dog/tracing/contrib/http/configuration/settings.rb 100.00% <100.00%> (ø)
lib/datadog/tracing/contrib/http/ext.rb 100.00% <100.00%> (ø)
...ib/datadog/tracing/contrib/http/instrumentation.rb 97.05% <100.00%> (-0.05%) ⬇️
spec/datadog/tracing/contrib/http/request_spec.rb 98.99% <100.00%> (-0.01%) ⬇️
spec/datadog/tracing/contrib/http_examples.rb 100.00% <100.00%> (ø)
...atadog/tracing/contrib/grpc/support/grpc_helper.rb 98.24% <0.00%> (-1.76%) ⬇️
lib/datadog/core/diagnostics/environment_logger.rb 98.46% <0.00%> (-0.79%) ⬇️
...ec/datadog/tracing/contrib/sidekiq/patcher_spec.rb 100.00% <0.00%> (+4.00%) ⬆️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@TonyCTHsu TonyCTHsu merged commit 095cffb into DataDog:master Jan 20, 2023
@github-actions github-actions bot added this to the 1.9.0 milestone Jan 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Was opened by a community member integrations Involves tracing integrations tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants