Distributed tracing for Sidekiq #2513

TonyCTHsu · 2023-01-02T21:51:28Z

What does this PR do?
#2295

Implement distributed tracing for sidekiq

lib/datadog/tracing/contrib/sidekiq/client_tracer.rb

lib/datadog/tracing/contrib/sidekiq/distributed/propagation.rb

lib/datadog/tracing/contrib/sidekiq/server_tracer.rb

spec/datadog/tracing/contrib/sidekiq/distributed_tracing_spec.rb

sled · 2023-03-13T13:36:35Z

Is there any way I could help getting this PR merged? @TonyCTHsu

TonyCTHsu · 2023-03-21T10:32:01Z

👋 @sled , thanks for offering to help.

I hesitated to continue this due to the fact that it is difficult to make sense of distributed tracing for asynchronous process in UI. The mechanism for distributed tracing is by context propagation, however, asynchronous process leaves a huge gap between the time when a job being pushed into a queue and before picked up by workers (a typical http request/response cycle would be milliseconds, while this gap could take seconds or longer, depends on how long it has been sitting and waiting inside of the queue). This gap takes too much space for the entire trace graph and makes the spans (actual code execution) tiny and hard to read.

Furthermore, how does the entire trace expected to look like when a job fail and push back into the queue, then retry several times before it is considered dead? The duration of the trace could easily extended beyond days and weeks for asynchronous process.

If you are interested in this feature, perhaps we could release it in a opt-in configuration, how does that sound?

sled · 2023-03-21T17:21:45Z

@TonyCTHsu if this is just a display issue, maybe the Datadog UI team could have a look at it? The big gaps between enqueueing and execution of the job could be squished for example.

I think distributed tracing is an excellent fit for asynchronous systems like background jobs or event processing because it allows you to stitch together the whole picture starting from the initiator. Retries should also continue the original trace, a common scenario is a faulty application (v1) which enqueued jobs with wrong parameters. This gets fixed in v2, but you might still see job retries. With distributed tracing, you can easily trace those retried jobs to the faulty version (v1) which originally enqueued the job.

How is this solved in other languages or frameworks, i.e. Java/Spring, Kafka etc.?

sled · 2023-03-21T17:27:40Z

I've implemented a Sidekiq middleware based on your PR, here's how the distributed traces look:

TonyCTHsu · 2023-03-22T09:53:38Z

👋 @sled , thanks for sharing! Other languages tracers have synthetic spans to fill the gap which did not fully address the UI issue, but Datadog is working on alternative solution for asynchronous tracing instead of implementing distributed tracing.

Since I don't know when the alternative solution would be available and your graph looks fine, if this make sense to you, I believe we should move forward!

davidgm0 · 2023-04-04T12:30:46Z

Hi, I wanted to ask if there's a timeline for this PR getting merged. Is there something missing? I tried it and at least for me it's looking good!

Appraisals

TonyCTHsu added 4 commits January 2, 2023 22:46

Implementation for distributed tracing

76d75c7

Change scope for Sidekiq::Extensions.enable_delay!

3ec5c35

Spec for distributed tracing

325cdd2

Rubocop

1ecb281

github-actions bot added integrations Involves tracing integrations tracing labels Jan 2, 2023

marcotc reviewed Jan 3, 2023

View reviewed changes

lib/datadog/tracing/contrib/sidekiq/client_tracer.rb Show resolved Hide resolved

TonyCTHsu self-assigned this Jan 4, 2023

marcotc reviewed Jan 4, 2023

View reviewed changes

lib/datadog/tracing/contrib/sidekiq/distributed/propagation.rb Outdated Show resolved Hide resolved

marcotc reviewed Jan 4, 2023

View reviewed changes

lib/datadog/tracing/contrib/sidekiq/distributed/propagation.rb Outdated Show resolved Hide resolved

marcotc reviewed Jan 4, 2023

View reviewed changes

lib/datadog/tracing/contrib/sidekiq/server_tracer.rb Outdated Show resolved Hide resolved

marcotc reviewed Jan 5, 2023

View reviewed changes

spec/datadog/tracing/contrib/sidekiq/distributed_tracing_spec.rb Outdated Show resolved Hide resolved

Merge branch 'master' into tonycthsu/sidekiq-distributed-tracing

32eef81

Merge branch 'master' into tonycthsu/sidekiq-distributed-tracing

aace12e

TonyCTHsu force-pushed the tonycthsu/sidekiq-distributed-tracing branch from 4f6d082 to aace12e Compare April 4, 2023 13:02

TonyCTHsu added 6 commits April 5, 2023 11:42

Remove fetcher

de1bac2

Remove sorbet typed comment

651f3e4

Refactor propagation instance

ee7f4a0

Add negative distributed tracing tests

5c21330

Port sidekiq test helper for testing Sidekiq 3.x

70319c7

Merge branch 'master' into tonycthsu/sidekiq-distributed-tracing

c2e3a13

TonyCTHsu force-pushed the tonycthsu/sidekiq-distributed-tracing branch from 2b52318 to c2e3a13 Compare April 5, 2023 13:56

TonyCTHsu marked this pull request as ready for review April 5, 2023 16:02

TonyCTHsu requested a review from a team April 5, 2023 16:02

Improve tests

1bfeb31

marcotc reviewed Apr 5, 2023

View reviewed changes

Appraisals Outdated Show resolved Hide resolved

TonyCTHsu added 2 commits April 5, 2023 22:29

Adding documentation

80e29c0

Cleanup

bb84c07

TonyCTHsu added this to the 1.11.0 milestone Apr 12, 2023

marcotc approved these changes Apr 12, 2023

View reviewed changes

TonyCTHsu merged commit 7fc111a into master Apr 13, 2023

TonyCTHsu deleted the tonycthsu/sidekiq-distributed-tracing branch April 13, 2023 13:32

TonyCTHsu restored the tonycthsu/sidekiq-distributed-tracing branch April 13, 2023 13:36

lloeki modified the milestones: 1.11.0, 1.11.0.beta1 Apr 14, 2023

GustavoCaso deleted the tonycthsu/sidekiq-distributed-tracing branch May 19, 2023 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed tracing for Sidekiq #2513

Distributed tracing for Sidekiq #2513

TonyCTHsu commented Jan 2, 2023 •

edited

Loading

sled commented Mar 13, 2023

TonyCTHsu commented Mar 21, 2023 •

edited

Loading

sled commented Mar 21, 2023 •

edited

Loading

sled commented Mar 21, 2023

TonyCTHsu commented Mar 22, 2023

davidgm0 commented Apr 4, 2023 •

edited

Loading

Distributed tracing for Sidekiq #2513

Distributed tracing for Sidekiq #2513

Conversation

TonyCTHsu commented Jan 2, 2023 • edited Loading

sled commented Mar 13, 2023

TonyCTHsu commented Mar 21, 2023 • edited Loading

sled commented Mar 21, 2023 • edited Loading

sled commented Mar 21, 2023

TonyCTHsu commented Mar 22, 2023

davidgm0 commented Apr 4, 2023 • edited Loading

TonyCTHsu commented Jan 2, 2023 •

edited

Loading

TonyCTHsu commented Mar 21, 2023 •

edited

Loading

sled commented Mar 21, 2023 •

edited

Loading

davidgm0 commented Apr 4, 2023 •

edited

Loading