Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry worker: flush data after stops #515

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cataphract
Copy link
Contributor

@cataphract cataphract commented Jul 1, 2024

Telemetry workers are functionally dead after a Stop lifecycle action, provided there's no intervening Start. While AddPoint actions are still processed, their data is never flushed, since the Stop action handler unschedules FlushMetrics and FlushData actions.

PHP sends a Stop action at the end of every request via ddog_sidecar_telemetry_end(), but a Start action is only generated just after a telemetry worker is spawned. With no more Start actions generated, no metrics can effectively be sent after the first Stop.

It is not clear to me whether the intention is to have a Start/Stop pair on every PHP request (where Stop flushes the metrics) or if the intention is to to have only such a pair in the first request, with the Stop event generated by ddog_sidecar_telemetry_end() effectively a noop. It would appear, judging by this
comment
:

Also allow the telemetry worker to have a mode where it's continuing
execution after a start-stop cycle, otherwise it won't send any more metrics afterwards.

that the intention is to keep sending metrics after a Start/Stop pair. It also makes more sense, insofar as data is flushed only on the interval, rather than after every request via Stop. In that case:

  • The Stop action handler should not unschedule FlushData and FlushMetrics events and
  • FlushData, if called outside a Start-Stop pair, should not be a noop.

Finally: swap the order in which FlushData and FlushMetrics are scheduled so that FlushMetrics runs first and therefore its generated data can be sent by the next FlushData.

@cataphract cataphract requested a review from a team as a code owner July 1, 2024 13:55
@cataphract cataphract requested review from pawelchcki, bwoebi and bantonsson and removed request for a team and pawelchcki July 1, 2024 13:55
@pr-commenter
Copy link

pr-commenter bot commented Jul 1, 2024

Benchmarks

Comparison

Benchmark execution time: 2024-10-10 12:22:42

Comparing candidate commit 9c34541 in PR branch glopes/flush-data-after-stop with baseline commit f363618 in branch main.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 50 metrics, 2 unstable metrics.

scenario:benching deserializing traces from msgpack to their internal representation

  • 🟩 execution_time [-36.298ns; -26.456ns] or [-2.989%; -2.178%]

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 1.373µs 3.170µs ± 1.563µs 3.014µs ± 0.020µs 3.030µs 3.084µs 13.789µs 18.128µs 501.39% 8.034 65.437 49.17% 0.110µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [2.954µs; 3.387µs] or [-6.831%; +6.831%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 38.215µs 38.806µs ± 1.011µs 38.342µs ± 0.054µs 38.443µs 40.955µs 41.021µs 42.037µs 9.64% 1.699 1.010 2.60% 0.071µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [38.666µs; 38.946µs] or [-0.361%; +0.361%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 2.013µs 2.014µs ± 0.001µs 2.014µs ± 0.001µs 2.015µs 2.016µs 2.017µs 2.030µs 0.80% 8.000 89.054 0.07% 0.000µs 1 200
credit_card/is_card_number/ throughput 492569834.547op/s 496447840.307op/s ± 334865.222op/s 496505188.483op/s ± 123506.989op/s 496619337.627op/s 496687469.719op/s 496726409.388op/s 496745392.224op/s 0.05% -7.946 88.183 0.07% 23678.547op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 124.089µs 125.595µs ± 0.547µs 125.586µs ± 0.373µs 125.988µs 126.420µs 126.781µs 127.068µs 1.18% -0.105 -0.005 0.43% 0.039µs 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 7869819.028op/s 7962264.477op/s ± 34691.802op/s 7962670.999op/s ± 23698.266op/s 7985202.822op/s 8021305.043op/s 8043513.216op/s 8058710.933op/s 1.21% 0.131 0.004 0.43% 2453.081op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 114.381µs 116.904µs ± 0.575µs 116.953µs ± 0.310µs 117.239µs 117.816µs 118.206µs 118.296µs 1.15% -0.716 2.293 0.49% 0.041µs 1 200
credit_card/is_card_number/ 378282246310005 throughput 8453344.615op/s 8554236.542op/s ± 42216.863op/s 8550452.267op/s ± 22586.917op/s 8575683.451op/s 8617377.603op/s 8682438.973op/s 8742733.873op/s 2.25% 0.772 2.458 0.49% 2985.183op/s 1 200
credit_card/is_card_number/37828224631 execution_time 2.013µs 2.014µs ± 0.001µs 2.014µs ± 0.000µs 2.014µs 2.015µs 2.016µs 2.031µs 0.85% 10.450 130.863 0.07% 0.000µs 1 200
credit_card/is_card_number/37828224631 throughput 492283981.525op/s 496479933.054op/s ± 329401.667op/s 496481605.357op/s ± 118257.981op/s 496633333.646op/s 496704828.654op/s 496731563.344op/s 496761942.572op/s 0.06% -10.403 130.059 0.07% 23292.215op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 110.756µs 113.104µs ± 0.651µs 113.120µs ± 0.426µs 113.530µs 114.231µs 114.602µs 114.652µs 1.35% -0.175 0.484 0.57% 0.046µs 1 200
credit_card/is_card_number/378282246310005 throughput 8722069.777op/s 8841734.623op/s ± 50978.375op/s 8840143.392op/s ± 33286.037op/s 8874497.138op/s 8924448.101op/s 8966107.410op/s 9028872.712op/s 2.13% 0.218 0.540 0.58% 3604.715op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 112.386µs 113.477µs ± 0.373µs 113.405µs ± 0.230µs 113.701µs 114.164µs 114.383µs 114.747µs 1.18% 0.450 0.444 0.33% 0.026µs 1 200
credit_card/is_card_number/37828224631000521389798 throughput 8714849.536op/s 8812475.188op/s ± 28908.905op/s 8817969.953op/s ± 17896.677op/s 8833225.016op/s 8849580.709op/s 8878125.069op/s 8897908.102op/s 0.91% -0.428 0.433 0.33% 2044.168op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 23.346µs 24.131µs ± 0.400µs 24.103µs ± 0.290µs 24.396µs 24.885µs 24.995µs 25.054µs 3.95% 0.261 -0.573 1.65% 0.028µs 1 200
credit_card/is_card_number/x371413321323331 throughput 39914108.685op/s 41452492.784op/s ± 684176.709op/s 41489073.853op/s ± 499466.076op/s 41972511.327op/s 42497800.329op/s 42741689.923op/s 42834022.806op/s 3.24% -0.194 -0.608 1.65% 48378.599op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 2.013µs 2.014µs ± 0.001µs 2.014µs ± 0.000µs 2.015µs 2.015µs 2.016µs 2.018µs 0.19% 1.205 6.065 0.03% 0.000µs 1 200
credit_card/is_card_number_no_luhn/ throughput 495522927.891op/s 496490488.487op/s ± 156187.759op/s 496474059.233op/s ± 108697.147op/s 496621523.287op/s 496702182.624op/s 496751028.167op/s 496810059.158op/s 0.07% -1.199 6.021 0.03% 11044.142op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 97.419µs 98.764µs ± 0.617µs 98.716µs ± 0.424µs 99.159µs 99.856µs 100.425µs 100.850µs 2.16% 0.560 0.262 0.62% 0.044µs 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 9915754.004op/s 10125531.944op/s ± 63065.723op/s 10130024.830op/s ± 43591.617op/s 10172750.254op/s 10217405.759op/s 10233464.697op/s 10264904.994op/s 1.33% -0.524 0.195 0.62% 4459.420op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 88.987µs 90.463µs ± 0.596µs 90.435µs ± 0.424µs 90.864µs 91.515µs 91.765µs 91.942µs 1.67% 0.101 -0.441 0.66% 0.042µs 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 10876366.674op/s 11054721.201op/s ± 72846.602op/s 11057685.004op/s ± 51765.453op/s 11109115.309op/s 11172475.008op/s 11204475.884op/s 11237544.169op/s 1.63% -0.071 -0.445 0.66% 5151.033op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 2.013µs 2.014µs ± 0.001µs 2.014µs ± 0.001µs 2.015µs 2.016µs 2.018µs 2.031µs 0.85% 8.450 95.231 0.07% 0.000µs 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 492339924.573op/s 496460274.099op/s ± 350246.388op/s 496500546.156op/s ± 124901.433op/s 496627571.209op/s 496706878.415op/s 496728749.558op/s 496853859.319op/s 0.07% -8.395 94.337 0.07% 24766.160op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 85.315µs 86.489µs ± 0.473µs 86.514µs ± 0.334µs 86.810µs 87.282µs 87.518µs 87.645µs 1.31% 0.099 -0.407 0.55% 0.033µs 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 11409683.064op/s 11562569.119op/s ± 63229.962op/s 11558828.033op/s ± 44523.768op/s 11606376.849op/s 11665720.826op/s 11688244.458op/s 11721334.567op/s 1.41% -0.073 -0.414 0.55% 4471.033op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 112.821µs 113.563µs ± 0.373µs 113.529µs ± 0.284µs 113.838µs 114.234µs 114.428µs 114.436µs 0.80% 0.419 -0.554 0.33% 0.026µs 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 8738508.003op/s 8805806.982op/s ± 28847.328op/s 8808295.953op/s ± 22054.013op/s 8829217.707op/s 8845117.725op/s 8854729.572op/s 8863606.034op/s 0.63% -0.407 -0.566 0.33% 2039.814op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 23.252µs 24.099µs ± 0.372µs 24.080µs ± 0.286µs 24.383µs 24.807µs 24.918µs 25.040µs 3.99% 0.259 -0.510 1.54% 0.026µs 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 39935387.430op/s 41504850.644op/s ± 638941.067op/s 41527712.410op/s ± 497742.362op/s 42008739.157op/s 42501098.186op/s 42704420.657op/s 43007721.700op/s 3.56% -0.193 -0.545 1.54% 45179.956op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [2.014µs; 2.015µs] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/ throughput [496401431.208op/s; 496494249.406op/s] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [125.519µs; 125.671µs] or [-0.060%; +0.060%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [7957456.526op/s; 7967072.427op/s] or [-0.060%; +0.060%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [116.824µs; 116.984µs] or [-0.068%; +0.068%] None None None
credit_card/is_card_number/ 378282246310005 throughput [8548385.691op/s; 8560087.393op/s] or [-0.068%; +0.068%] None None None
credit_card/is_card_number/37828224631 execution_time [2.014µs; 2.014µs] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/37828224631 throughput [496434281.151op/s; 496525584.957op/s] or [-0.009%; +0.009%] None None None
credit_card/is_card_number/378282246310005 execution_time [113.013µs; 113.194µs] or [-0.080%; +0.080%] None None None
credit_card/is_card_number/378282246310005 throughput [8834669.511op/s; 8848799.736op/s] or [-0.080%; +0.080%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [113.425µs; 113.528µs] or [-0.046%; +0.046%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [8808468.692op/s; 8816481.684op/s] or [-0.045%; +0.045%] None None None
credit_card/is_card_number/x371413321323331 execution_time [24.075µs; 24.186µs] or [-0.230%; +0.230%] None None None
credit_card/is_card_number/x371413321323331 throughput [41357672.472op/s; 41547313.096op/s] or [-0.229%; +0.229%] None None None
credit_card/is_card_number_no_luhn/ execution_time [2.014µs; 2.014µs] or [-0.004%; +0.004%] None None None
credit_card/is_card_number_no_luhn/ throughput [496468842.366op/s; 496512134.608op/s] or [-0.004%; +0.004%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [98.679µs; 98.850µs] or [-0.087%; +0.087%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [10116791.642op/s; 10134272.247op/s] or [-0.086%; +0.086%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [90.380µs; 90.546µs] or [-0.091%; +0.091%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [11044625.363op/s; 11064817.039op/s] or [-0.091%; +0.091%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [2.014µs; 2.014µs] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [496411733.319op/s; 496508814.880op/s] or [-0.010%; +0.010%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [86.423µs; 86.554µs] or [-0.076%; +0.076%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [11553806.055op/s; 11571332.184op/s] or [-0.076%; +0.076%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [113.511µs; 113.614µs] or [-0.045%; +0.045%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [8801809.019op/s; 8809804.944op/s] or [-0.045%; +0.045%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [24.048µs; 24.151µs] or [-0.214%; +0.214%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [41416299.557op/s; 41593401.730op/s] or [-0.213%; +0.213%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 292.880ns 303.793ns ± 15.063ns 296.193ns ± 2.732ns 311.811ns 335.737ns 352.795ns 353.964ns 19.50% 1.727 2.335 4.95% 1.065ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [301.705ns; 305.880ns] or [-0.687%; +0.687%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 2.736µs 2.765µs ± 0.015µs 2.762µs ± 0.010µs 2.774µs 2.798µs 2.802µs 2.803µs 1.50% 0.657 -0.193 0.56% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [2.763µs; 2.767µs] or [-0.078%; +0.078%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 17.898µs 22.125µs ± 7.728µs 18.396µs ± 0.115µs 18.621µs 35.829µs 37.249µs 74.542µs 305.20% 2.485 9.472 34.84% 0.546µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [21.054µs; 23.196µs] or [-4.841%; +4.841%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 506.041µs 507.164µs ± 1.061µs 506.987µs ± 0.335µs 507.393µs 508.192µs 508.832µs 519.908µs 2.55% 8.799 102.376 0.21% 0.075µs 1 200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput 1923418.090op/s 1971756.258op/s ± 4051.034op/s 1972438.651op/s ± 1301.890op/s 1973434.095op/s 1974965.030op/s 1975732.308op/s 1976126.072op/s 0.19% -8.631 99.644 0.20% 286.451op/s 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 468.243µs 468.899µs ± 0.260µs 468.904µs ± 0.184µs 469.076µs 469.327µs 469.512µs 469.733µs 0.18% 0.148 0.002 0.06% 0.018µs 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput 2128869.907op/s 2132654.388op/s ± 1184.589op/s 2132632.094op/s ± 837.333op/s 2133476.943op/s 2134399.413op/s 2135307.399op/s 2135643.965op/s 0.14% -0.144 -0.001 0.06% 83.763op/s 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 180.256µs 180.663µs ± 0.151µs 180.669µs ± 0.094µs 180.756µs 180.905µs 181.029µs 181.130µs 0.25% 0.051 0.284 0.08% 0.011µs 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput 5520910.719op/s 5535170.402op/s ± 4628.020op/s 5534975.273op/s ± 2869.325op/s 5538001.292op/s 5542893.204op/s 5545264.837op/s 5547673.799op/s 0.23% -0.045 0.281 0.08% 327.250op/s 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 44.313µs 44.469µs ± 0.059µs 44.466µs ± 0.028µs 44.494µs 44.540µs 44.584µs 44.895µs 0.96% 3.066 20.140 0.13% 0.004µs 1 200
normalization/normalize_service/normalize_service/[empty string] throughput 22274360.726op/s 22487517.623op/s ± 29888.898op/s 22488883.815op/s ± 14036.330op/s 22503289.348op/s 22525731.778op/s 22540034.887op/s 22566652.724op/s 0.35% -3.016 19.726 0.13% 2113.464op/s 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 49.046µs 49.166µs ± 0.047µs 49.163µs ± 0.030µs 49.194µs 49.246µs 49.271µs 49.405µs 0.49% 0.694 2.575 0.10% 0.003µs 1 200
normalization/normalize_service/normalize_service/test_ASCII throughput 20240788.923op/s 20339298.774op/s ± 19509.150op/s 20340676.958op/s ± 12303.708op/s 20352090.027op/s 20364989.955op/s 20383088.972op/s 20388860.886op/s 0.24% -0.682 2.528 0.10% 1379.505op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [507.017µs; 507.311µs] or [-0.029%; +0.029%] None None None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput [1971194.823op/s; 1972317.692op/s] or [-0.028%; +0.028%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [468.863µs; 468.935µs] or [-0.008%; +0.008%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput [2132490.216op/s; 2132818.561op/s] or [-0.008%; +0.008%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [180.642µs; 180.684µs] or [-0.012%; +0.012%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput [5534529.003op/s; 5535811.801op/s] or [-0.012%; +0.012%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [44.461µs; 44.477µs] or [-0.018%; +0.018%] None None None
normalization/normalize_service/normalize_service/[empty string] throughput [22483375.309op/s; 22491659.937op/s] or [-0.018%; +0.018%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [49.159µs; 49.172µs] or [-0.013%; +0.013%] None None None
normalization/normalize_service/normalize_service/test_ASCII throughput [20336594.993op/s; 20342002.554op/s] or [-0.013%; +0.013%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
concentrator/add_spans_to_concentrator execution_time 8.910ms 8.943ms ± 0.019ms 8.941ms ± 0.010ms 8.952ms 8.966ms 8.994ms 9.113ms 1.93% 3.835 29.671 0.22% 0.001ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
concentrator/add_spans_to_concentrator execution_time [8.940ms; 8.946ms] or [-0.030%; +0.030%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 270.685µs 271.746µs ± 0.476µs 271.746µs ± 0.249µs 271.935µs 272.571µs 273.149µs 274.243µs 0.92% 1.188 3.736 0.17% 0.034µs 1 200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput 3646404.287op/s 3679913.627op/s ± 6431.444op/s 3679907.649op/s ± 3377.292op/s 3683922.970op/s 3688796.684op/s 3692206.660op/s 3694331.049op/s 0.39% -1.165 3.631 0.17% 454.772op/s 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 26.573µs 26.621µs ± 0.037µs 26.624µs ± 0.031µs 26.641µs 26.690µs 26.724µs 26.742µs 0.44% 0.736 0.183 0.14% 0.003µs 1 200
normalization/normalize_name/normalize_name/bad-name throughput 37394024.296op/s 37563703.802op/s ± 51773.677op/s 37560200.665op/s ± 43732.428op/s 37612736.997op/s 37625771.326op/s 37628905.132op/s 37631553.640op/s 0.19% -0.729 0.165 0.14% 3660.952op/s 1 200
normalization/normalize_name/normalize_name/good execution_time 16.115µs 16.158µs ± 0.046µs 16.153µs ± 0.033µs 16.185µs 16.266µs 16.296µs 16.323µs 1.05% 1.372 1.476 0.29% 0.003µs 1 200
normalization/normalize_name/normalize_name/good throughput 61261759.347op/s 61889802.880op/s ± 177156.896op/s 61907973.047op/s ± 128308.949op/s 62036504.638op/s 62045048.910op/s 62047819.147op/s 62055450.905op/s 0.24% -1.359 1.428 0.29% 12526.884op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [271.680µs; 271.812µs] or [-0.024%; +0.024%] None None None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput [3679022.290op/s; 3680804.963op/s] or [-0.024%; +0.024%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [26.616µs; 26.627µs] or [-0.019%; +0.019%] None None None
normalization/normalize_name/normalize_name/bad-name throughput [37556528.469op/s; 37570879.136op/s] or [-0.019%; +0.019%] None None None
normalization/normalize_name/normalize_name/good execution_time [16.151µs; 16.164µs] or [-0.040%; +0.040%] None None None
normalization/normalize_name/normalize_name/good throughput [61865250.638op/s; 61914355.122op/s] or [-0.040%; +0.040%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 137.397µs 138.854µs ± 0.375µs 138.876µs ± 0.165µs 139.031µs 139.365µs 139.931µs 140.798µs 1.38% 0.273 5.627 0.27% 0.027µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [138.802µs; 138.906µs] or [-0.037%; +0.037%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 75.242µs 75.448µs ± 0.144µs 75.439µs ± 0.041µs 75.479µs 75.549µs 75.693µs 77.194µs 2.33% 9.148 106.778 0.19% 0.010µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [75.428µs; 75.468µs] or [-0.026%; +0.026%] None None None

Group 12

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz 9c34541 1728392591 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 1.118µs 1.183µs ± 0.025µs 1.185µs ± 0.017µs 1.203µs 1.209µs 1.211µs 1.212µs 2.22% -1.072 0.352 2.10% 0.002µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [1.180µs; 1.187µs] or [-0.292%; +0.292%] None None None

Baseline

Omitted due to size.

Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These booleans are woefully undocumented and I'm not completely sure about the expected life cycle, but from reading the code it looks like your interpretation is correct.

I think that it might be better to leave the worker as started if it's restartable and leave the checks in place.

@@ -296,7 +293,9 @@ impl TelemetryWorker {
self.log_err(&e);
}
self.data.started = false;
self.deadlines.clear_pending();
if !self.config.restartable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough to only include self.data.started = false; inside the if statement as well, and leave the exit early checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends. Do we want Start, Stop, Stop to generate two stops? Because that's what PHP ends up generating. The second stop is a noop, but if I moved the assignment self.data.started = false under the condition if !self.config.restartable, then the stops would be effective

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the exit early checks are still in the code, then how would the second Stop be effective?

Copy link
Contributor Author

@cataphract cataphract Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, what you're proposing is that I reset started = false only if !restartable. In that case started would stay true forever once there is a start. So the early check

 if !self.data.started {
   return BREAK;
 }

would never be hit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I was confusing the early checks.

There is something in the logic that feels a bit broken. The FlushData is also protected by this !self.data.started check. Should it work after a restartable stop?

The things that Stop does, should they happen when the first request ends, or when the worker stops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should [FlushData] work after a restartable stop?

My guess is yes, otherwise there is no way to send the data that's collected after the stop (build_observability_batch is called only from the handlers of Stop and FlushData), at least not without an intervening start+stop.

The things that Stop does, should they happen when the first request ends, or when the worker stops?

That is a good point. The final flush of the metrics should happen when the worker stops, not when handling Stop. I guess at some point they were the same, but then the restart thing was introduced. But regardless, once we have a way to send metrics after a Stop, for that happen periodic flushes should still happen. So FlushData shouldn't be skipped or unscheduled after a Stop.

@@ -458,7 +454,9 @@ impl TelemetryWorker {
.await;

self.data.started = false;
self.deadlines.clear_pending();
if !self.config.restartable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch from 4f63a34 to 251bafa Compare July 4, 2024 15:39
@codecov-commenter
Copy link

codecov-commenter commented Jul 4, 2024

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.24%. Comparing base (cc8ed56) to head (e0747f5).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #515      +/-   ##
==========================================
+ Coverage   70.10%   70.24%   +0.13%     
==========================================
  Files         214      214              
  Lines       28801    28805       +4     
==========================================
+ Hits        20192    20235      +43     
+ Misses       8609     8570      -39     
Components Coverage Δ
crashtracker 21.20% <ø> (ø)
datadog-alloc 98.73% <ø> (ø)
data-pipeline 50.00% <ø> (ø)
data-pipeline-ffi 0.00% <ø> (ø)
ddcommon 83.07% <ø> (ø)
ddcommon-ffi 70.20% <ø> (ø)
ddtelemetry 59.01% <83.33%> (+0.05%) ⬆️
ipc 84.18% <ø> (ø)
profiling 84.26% <ø> (+0.69%) ⬆️
profiling-ffi 77.42% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 34.55% <ø> (ø)
sidecar-ffi 0.00% <ø> (ø)
spawn-worker 54.98% <ø> (ø)
trace-mini-agent 70.88% <ø> (ø)
trace-normalization 98.24% <ø> (ø)
trace-obfuscation 95.73% <ø> (ø)
trace-protobuf 77.16% <ø> (ø)
trace-utils 90.90% <ø> (ø)

@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch 4 times, most recently from 447c409 to a7d11b0 Compare July 12, 2024 16:03
@cataphract cataphract requested review from a team as code owners July 12, 2024 16:03
Telemetry workers are functionally dead after a Stop lifecycle action,
provided there's no intervening Start. While AddPoint actions are still
processed, their data is never flushed, since the Stop action handler
unschedules FlushMetrics and FlushData actions.

PHP sends a Stop action at the end of every request via
ddog_sidecar_telemetry_end(), but a Start action is only generated just
after a telemetry worker is spawned.

It is not clear to me whether the intention is to a Start/Stop pair on
every PHP requests (where Stop flushes the metrics) or if the intention
is to to have only such a pair in the first request, with the Stop event
generated by ddog_sidecar_telemetry_end() effectively a noop. It would
appear, judging by [this
comment](#391):

> Also allow the telemetry worker to have a mode where it's continuing
execution after a start-stop cycle, otherwise it won't send any more
metrics afterwards.

that the intention is to keep sending metrics after a Start/Stop pair.
In that case:

* The Stop action handler should not unschedule FlushData and
  FlushMetrics events and
* FlushData, if called outside a Start-Stop pair, should not be a noop.

Finally: swap the order in which FlushData and FlushMetrics are
scheduled so that FlushMetrics runs first and therefore its generated
data can be sent by the next FlushData.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants