Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed tracing for Sidekiq #2513

Merged
merged 15 commits into from
Apr 13, 2023
Merged

Conversation

TonyCTHsu
Copy link
Contributor

@TonyCTHsu TonyCTHsu commented Jan 2, 2023

What does this PR do?
#2295

Implement distributed tracing for sidekiq

@github-actions github-actions bot added integrations Involves tracing integrations tracing labels Jan 2, 2023
@TonyCTHsu TonyCTHsu self-assigned this Jan 4, 2023
@sled
Copy link

sled commented Mar 13, 2023

Is there any way I could help getting this PR merged? @TonyCTHsu

@TonyCTHsu
Copy link
Contributor Author

TonyCTHsu commented Mar 21, 2023

👋 @sled , thanks for offering to help.

I hesitated to continue this due to the fact that it is difficult to make sense of distributed tracing for asynchronous process in UI. The mechanism for distributed tracing is by context propagation, however, asynchronous process leaves a huge gap between the time when a job being pushed into a queue and before picked up by workers (a typical http request/response cycle would be milliseconds, while this gap could take seconds or longer, depends on how long it has been sitting and waiting inside of the queue). This gap takes too much space for the entire trace graph and makes the spans (actual code execution) tiny and hard to read.

Furthermore, how does the entire trace expected to look like when a job fail and push back into the queue, then retry several times before it is considered dead? The duration of the trace could easily extended beyond days and weeks for asynchronous process.

If you are interested in this feature, perhaps we could release it in a opt-in configuration, how does that sound?

@sled
Copy link

sled commented Mar 21, 2023

@TonyCTHsu if this is just a display issue, maybe the Datadog UI team could have a look at it? The big gaps between enqueueing and execution of the job could be squished for example.

I think distributed tracing is an excellent fit for asynchronous systems like background jobs or event processing because it allows you to stitch together the whole picture starting from the initiator. Retries should also continue the original trace, a common scenario is a faulty application (v1) which enqueued jobs with wrong parameters. This gets fixed in v2, but you might still see job retries. With distributed tracing, you can easily trace those retried jobs to the faulty version (v1) which originally enqueued the job.

How is this solved in other languages or frameworks, i.e. Java/Spring, Kafka etc.?

@sled
Copy link

sled commented Mar 21, 2023

I've implemented a Sidekiq middleware based on your PR, here's how the distributed traces look:

Screenshot 2023-03-21 at 18 24 49

@TonyCTHsu
Copy link
Contributor Author

👋 @sled , thanks for sharing! Other languages tracers have synthetic spans to fill the gap which did not fully address the UI issue, but Datadog is working on alternative solution for asynchronous tracing instead of implementing distributed tracing.

Since I don't know when the alternative solution would be available and your graph looks fine, if this make sense to you, I believe we should move forward!

@davidgm0
Copy link

davidgm0 commented Apr 4, 2023

Hi, I wanted to ask if there's a timeline for this PR getting merged. Is there something missing? I tried it and at least for me it's looking good!

@TonyCTHsu TonyCTHsu force-pushed the tonycthsu/sidekiq-distributed-tracing branch from 4f6d082 to aace12e Compare April 4, 2023 13:02
@TonyCTHsu TonyCTHsu force-pushed the tonycthsu/sidekiq-distributed-tracing branch from 2b52318 to c2e3a13 Compare April 5, 2023 13:56
@TonyCTHsu TonyCTHsu marked this pull request as ready for review April 5, 2023 16:02
@TonyCTHsu TonyCTHsu requested a review from a team April 5, 2023 16:02
Appraisals Outdated Show resolved Hide resolved
@TonyCTHsu TonyCTHsu added this to the 1.11.0 milestone Apr 12, 2023
@TonyCTHsu TonyCTHsu merged commit 7fc111a into master Apr 13, 2023
@TonyCTHsu TonyCTHsu deleted the tonycthsu/sidekiq-distributed-tracing branch April 13, 2023 13:32
@TonyCTHsu TonyCTHsu restored the tonycthsu/sidekiq-distributed-tracing branch April 13, 2023 13:36
@lloeki lloeki modified the milestones: 1.11.0, 1.11.0.beta1 Apr 14, 2023
@GustavoCaso GustavoCaso deleted the tonycthsu/sidekiq-distributed-tracing branch May 19, 2023 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
integrations Involves tracing integrations tracing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants