telemetry worker: flush data after stops #515

cataphract · 2024-07-01T13:55:51Z

Telemetry workers are functionally dead after a Stop lifecycle action, provided there's no intervening Start. While AddPoint actions are still processed, their data is never flushed, since the Stop action handler unschedules FlushMetrics and FlushData actions.

PHP sends a Stop action at the end of every request via ddog_sidecar_telemetry_end(), but a Start action is only generated just after a telemetry worker is spawned. With no more Start actions generated, no metrics can effectively be sent after the first Stop.

It is not clear to me whether the intention is to have a Start/Stop pair on every PHP request (where Stop flushes the metrics) or if the intention is to to have only such a pair in the first request, with the Stop event generated by ddog_sidecar_telemetry_end() effectively a noop. It would appear, judging by this
comment:

Also allow the telemetry worker to have a mode where it's continuing
execution after a start-stop cycle, otherwise it won't send any more metrics afterwards.

that the intention is to keep sending metrics after a Start/Stop pair. It also makes more sense, insofar as data is flushed only on the interval, rather than after every request via Stop. In that case:

The Stop action handler should not unschedule FlushData and FlushMetrics events and
FlushData, if called outside a Start-Stop pair, should not be a noop.

Finally: swap the order in which FlushData and FlushMetrics are scheduled so that FlushMetrics runs first and therefore its generated data can be sent by the next FlushData.

pr-commenter · 2024-07-01T14:00:11Z

Benchmarks

Comparison

Benchmark execution time: 2024-10-10 12:22:42

Comparing candidate commit 9c34541 in PR branch glopes/flush-data-after-stop with baseline commit f363618 in branch main.

Found 1 performance improvements and 0 performance regressions! Performance is the same for 50 metrics, 2 unstable metrics.

scenario:benching deserializing traces from msgpack to their internal representation

🟩 execution_time [-36.298ns; -26.456ns] or [-2.989%; -2.178%]

Candidate

Candidate benchmark details

Group 1

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
write only interface	execution_time	1.373µs	3.170µs ± 1.563µs	3.014µs ± 0.020µs	3.030µs	3.084µs	13.789µs	18.128µs	501.39%	8.034	65.437	49.17%	0.110µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
write only interface	execution_time	[2.954µs; 3.387µs] or [-6.831%; +6.831%]	None	None	None

Group 2

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
redis/obfuscate_redis_string	execution_time	38.215µs	38.806µs ± 1.011µs	38.342µs ± 0.054µs	38.443µs	40.955µs	41.021µs	42.037µs	9.64%	1.699	1.010	2.60%	0.071µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
redis/obfuscate_redis_string	execution_time	[38.666µs; 38.946µs] or [-0.361%; +0.361%]	None	None	None

Group 3

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
credit_card/is_card_number/	execution_time	2.013µs	2.014µs ± 0.001µs	2.014µs ± 0.001µs	2.015µs	2.016µs	2.017µs	2.030µs	0.80%	8.000	89.054	0.07%	0.000µs	1	200
credit_card/is_card_number/	throughput	492569834.547op/s	496447840.307op/s ± 334865.222op/s	496505188.483op/s ± 123506.989op/s	496619337.627op/s	496687469.719op/s	496726409.388op/s	496745392.224op/s	0.05%	-7.946	88.183	0.07%	23678.547op/s	1	200
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	124.089µs	125.595µs ± 0.547µs	125.586µs ± 0.373µs	125.988µs	126.420µs	126.781µs	127.068µs	1.18%	-0.105	-0.005	0.43%	0.039µs	1	200
credit_card/is_card_number/ 3782-8224-6310-005	throughput	7869819.028op/s	7962264.477op/s ± 34691.802op/s	7962670.999op/s ± 23698.266op/s	7985202.822op/s	8021305.043op/s	8043513.216op/s	8058710.933op/s	1.21%	0.131	0.004	0.43%	2453.081op/s	1	200
credit_card/is_card_number/ 378282246310005	execution_time	114.381µs	116.904µs ± 0.575µs	116.953µs ± 0.310µs	117.239µs	117.816µs	118.206µs	118.296µs	1.15%	-0.716	2.293	0.49%	0.041µs	1	200
credit_card/is_card_number/ 378282246310005	throughput	8453344.615op/s	8554236.542op/s ± 42216.863op/s	8550452.267op/s ± 22586.917op/s	8575683.451op/s	8617377.603op/s	8682438.973op/s	8742733.873op/s	2.25%	0.772	2.458	0.49%	2985.183op/s	1	200
credit_card/is_card_number/37828224631	execution_time	2.013µs	2.014µs ± 0.001µs	2.014µs ± 0.000µs	2.014µs	2.015µs	2.016µs	2.031µs	0.85%	10.450	130.863	0.07%	0.000µs	1	200
credit_card/is_card_number/37828224631	throughput	492283981.525op/s	496479933.054op/s ± 329401.667op/s	496481605.357op/s ± 118257.981op/s	496633333.646op/s	496704828.654op/s	496731563.344op/s	496761942.572op/s	0.06%	-10.403	130.059	0.07%	23292.215op/s	1	200
credit_card/is_card_number/378282246310005	execution_time	110.756µs	113.104µs ± 0.651µs	113.120µs ± 0.426µs	113.530µs	114.231µs	114.602µs	114.652µs	1.35%	-0.175	0.484	0.57%	0.046µs	1	200
credit_card/is_card_number/378282246310005	throughput	8722069.777op/s	8841734.623op/s ± 50978.375op/s	8840143.392op/s ± 33286.037op/s	8874497.138op/s	8924448.101op/s	8966107.410op/s	9028872.712op/s	2.13%	0.218	0.540	0.58%	3604.715op/s	1	200
credit_card/is_card_number/37828224631000521389798	execution_time	112.386µs	113.477µs ± 0.373µs	113.405µs ± 0.230µs	113.701µs	114.164µs	114.383µs	114.747µs	1.18%	0.450	0.444	0.33%	0.026µs	1	200
credit_card/is_card_number/37828224631000521389798	throughput	8714849.536op/s	8812475.188op/s ± 28908.905op/s	8817969.953op/s ± 17896.677op/s	8833225.016op/s	8849580.709op/s	8878125.069op/s	8897908.102op/s	0.91%	-0.428	0.433	0.33%	2044.168op/s	1	200
credit_card/is_card_number/x371413321323331	execution_time	23.346µs	24.131µs ± 0.400µs	24.103µs ± 0.290µs	24.396µs	24.885µs	24.995µs	25.054µs	3.95%	0.261	-0.573	1.65%	0.028µs	1	200
credit_card/is_card_number/x371413321323331	throughput	39914108.685op/s	41452492.784op/s ± 684176.709op/s	41489073.853op/s ± 499466.076op/s	41972511.327op/s	42497800.329op/s	42741689.923op/s	42834022.806op/s	3.24%	-0.194	-0.608	1.65%	48378.599op/s	1	200
credit_card/is_card_number_no_luhn/	execution_time	2.013µs	2.014µs ± 0.001µs	2.014µs ± 0.000µs	2.015µs	2.015µs	2.016µs	2.018µs	0.19%	1.205	6.065	0.03%	0.000µs	1	200
credit_card/is_card_number_no_luhn/	throughput	495522927.891op/s	496490488.487op/s ± 156187.759op/s	496474059.233op/s ± 108697.147op/s	496621523.287op/s	496702182.624op/s	496751028.167op/s	496810059.158op/s	0.07%	-1.199	6.021	0.03%	11044.142op/s	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	97.419µs	98.764µs ± 0.617µs	98.716µs ± 0.424µs	99.159µs	99.856µs	100.425µs	100.850µs	2.16%	0.560	0.262	0.62%	0.044µs	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	9915754.004op/s	10125531.944op/s ± 63065.723op/s	10130024.830op/s ± 43591.617op/s	10172750.254op/s	10217405.759op/s	10233464.697op/s	10264904.994op/s	1.33%	-0.524	0.195	0.62%	4459.420op/s	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	88.987µs	90.463µs ± 0.596µs	90.435µs ± 0.424µs	90.864µs	91.515µs	91.765µs	91.942µs	1.67%	0.101	-0.441	0.66%	0.042µs	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	10876366.674op/s	11054721.201op/s ± 72846.602op/s	11057685.004op/s ± 51765.453op/s	11109115.309op/s	11172475.008op/s	11204475.884op/s	11237544.169op/s	1.63%	-0.071	-0.445	0.66%	5151.033op/s	1	200
credit_card/is_card_number_no_luhn/37828224631	execution_time	2.013µs	2.014µs ± 0.001µs	2.014µs ± 0.001µs	2.015µs	2.016µs	2.018µs	2.031µs	0.85%	8.450	95.231	0.07%	0.000µs	1	200
credit_card/is_card_number_no_luhn/37828224631	throughput	492339924.573op/s	496460274.099op/s ± 350246.388op/s	496500546.156op/s ± 124901.433op/s	496627571.209op/s	496706878.415op/s	496728749.558op/s	496853859.319op/s	0.07%	-8.395	94.337	0.07%	24766.160op/s	1	200
credit_card/is_card_number_no_luhn/378282246310005	execution_time	85.315µs	86.489µs ± 0.473µs	86.514µs ± 0.334µs	86.810µs	87.282µs	87.518µs	87.645µs	1.31%	0.099	-0.407	0.55%	0.033µs	1	200
credit_card/is_card_number_no_luhn/378282246310005	throughput	11409683.064op/s	11562569.119op/s ± 63229.962op/s	11558828.033op/s ± 44523.768op/s	11606376.849op/s	11665720.826op/s	11688244.458op/s	11721334.567op/s	1.41%	-0.073	-0.414	0.55%	4471.033op/s	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	112.821µs	113.563µs ± 0.373µs	113.529µs ± 0.284µs	113.838µs	114.234µs	114.428µs	114.436µs	0.80%	0.419	-0.554	0.33%	0.026µs	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	8738508.003op/s	8805806.982op/s ± 28847.328op/s	8808295.953op/s ± 22054.013op/s	8829217.707op/s	8845117.725op/s	8854729.572op/s	8863606.034op/s	0.63%	-0.407	-0.566	0.33%	2039.814op/s	1	200
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	23.252µs	24.099µs ± 0.372µs	24.080µs ± 0.286µs	24.383µs	24.807µs	24.918µs	25.040µs	3.99%	0.259	-0.510	1.54%	0.026µs	1	200
credit_card/is_card_number_no_luhn/x371413321323331	throughput	39935387.430op/s	41504850.644op/s ± 638941.067op/s	41527712.410op/s ± 497742.362op/s	42008739.157op/s	42501098.186op/s	42704420.657op/s	43007721.700op/s	3.56%	-0.193	-0.545	1.54%	45179.956op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
credit_card/is_card_number/	execution_time	[2.014µs; 2.015µs] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number/	throughput	[496401431.208op/s; 496494249.406op/s] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	[125.519µs; 125.671µs] or [-0.060%; +0.060%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	throughput	[7957456.526op/s; 7967072.427op/s] or [-0.060%; +0.060%]	None	None	None
credit_card/is_card_number/ 378282246310005	execution_time	[116.824µs; 116.984µs] or [-0.068%; +0.068%]	None	None	None
credit_card/is_card_number/ 378282246310005	throughput	[8548385.691op/s; 8560087.393op/s] or [-0.068%; +0.068%]	None	None	None
credit_card/is_card_number/37828224631	execution_time	[2.014µs; 2.014µs] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number/37828224631	throughput	[496434281.151op/s; 496525584.957op/s] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number/378282246310005	execution_time	[113.013µs; 113.194µs] or [-0.080%; +0.080%]	None	None	None
credit_card/is_card_number/378282246310005	throughput	[8834669.511op/s; 8848799.736op/s] or [-0.080%; +0.080%]	None	None	None
credit_card/is_card_number/37828224631000521389798	execution_time	[113.425µs; 113.528µs] or [-0.046%; +0.046%]	None	None	None
credit_card/is_card_number/37828224631000521389798	throughput	[8808468.692op/s; 8816481.684op/s] or [-0.045%; +0.045%]	None	None	None
credit_card/is_card_number/x371413321323331	execution_time	[24.075µs; 24.186µs] or [-0.230%; +0.230%]	None	None	None
credit_card/is_card_number/x371413321323331	throughput	[41357672.472op/s; 41547313.096op/s] or [-0.229%; +0.229%]	None	None	None
credit_card/is_card_number_no_luhn/	execution_time	[2.014µs; 2.014µs] or [-0.004%; +0.004%]	None	None	None
credit_card/is_card_number_no_luhn/	throughput	[496468842.366op/s; 496512134.608op/s] or [-0.004%; +0.004%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	[98.679µs; 98.850µs] or [-0.087%; +0.087%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	[10116791.642op/s; 10134272.247op/s] or [-0.086%; +0.086%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	[90.380µs; 90.546µs] or [-0.091%; +0.091%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	[11044625.363op/s; 11064817.039op/s] or [-0.091%; +0.091%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	execution_time	[2.014µs; 2.014µs] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	throughput	[496411733.319op/s; 496508814.880op/s] or [-0.010%; +0.010%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	execution_time	[86.423µs; 86.554µs] or [-0.076%; +0.076%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	throughput	[11553806.055op/s; 11571332.184op/s] or [-0.076%; +0.076%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	[113.511µs; 113.614µs] or [-0.045%; +0.045%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	[8801809.019op/s; 8809804.944op/s] or [-0.045%; +0.045%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	[24.048µs; 24.151µs] or [-0.214%; +0.214%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	throughput	[41416299.557op/s; 41593401.730op/s] or [-0.213%; +0.213%]	None	None	None

Group 4

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_trace/test_trace	execution_time	292.880ns	303.793ns ± 15.063ns	296.193ns ± 2.732ns	311.811ns	335.737ns	352.795ns	353.964ns	19.50%	1.727	2.335	4.95%	1.065ns	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_trace/test_trace	execution_time	[301.705ns; 305.880ns] or [-0.687%; +0.687%]	None	None	None

Group 5

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
tags/replace_trace_tags	execution_time	2.736µs	2.765µs ± 0.015µs	2.762µs ± 0.010µs	2.774µs	2.798µs	2.802µs	2.803µs	1.50%	0.657	-0.193	0.56%	0.001µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
tags/replace_trace_tags	execution_time	[2.763µs; 2.767µs] or [-0.078%; +0.078%]	None	None	None

Group 6

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
two way interface	execution_time	17.898µs	22.125µs ± 7.728µs	18.396µs ± 0.115µs	18.621µs	35.829µs	37.249µs	74.542µs	305.20%	2.485	9.472	34.84%	0.546µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
two way interface	execution_time	[21.054µs; 23.196µs] or [-4.841%; +4.841%]	None	None	None

Group 7

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	506.041µs	507.164µs ± 1.061µs	506.987µs ± 0.335µs	507.393µs	508.192µs	508.832µs	519.908µs	2.55%	8.799	102.376	0.21%	0.075µs	1	200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	1923418.090op/s	1971756.258op/s ± 4051.034op/s	1972438.651op/s ± 1301.890op/s	1973434.095op/s	1974965.030op/s	1975732.308op/s	1976126.072op/s	0.19%	-8.631	99.644	0.20%	286.451op/s	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	468.243µs	468.899µs ± 0.260µs	468.904µs ± 0.184µs	469.076µs	469.327µs	469.512µs	469.733µs	0.18%	0.148	0.002	0.06%	0.018µs	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	2128869.907op/s	2132654.388op/s ± 1184.589op/s	2132632.094op/s ± 837.333op/s	2133476.943op/s	2134399.413op/s	2135307.399op/s	2135643.965op/s	0.14%	-0.144	-0.001	0.06%	83.763op/s	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	180.256µs	180.663µs ± 0.151µs	180.669µs ± 0.094µs	180.756µs	180.905µs	181.029µs	181.130µs	0.25%	0.051	0.284	0.08%	0.011µs	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	5520910.719op/s	5535170.402op/s ± 4628.020op/s	5534975.273op/s ± 2869.325op/s	5538001.292op/s	5542893.204op/s	5545264.837op/s	5547673.799op/s	0.23%	-0.045	0.281	0.08%	327.250op/s	1	200
normalization/normalize_service/normalize_service/[empty string]	execution_time	44.313µs	44.469µs ± 0.059µs	44.466µs ± 0.028µs	44.494µs	44.540µs	44.584µs	44.895µs	0.96%	3.066	20.140	0.13%	0.004µs	1	200
normalization/normalize_service/normalize_service/[empty string]	throughput	22274360.726op/s	22487517.623op/s ± 29888.898op/s	22488883.815op/s ± 14036.330op/s	22503289.348op/s	22525731.778op/s	22540034.887op/s	22566652.724op/s	0.35%	-3.016	19.726	0.13%	2113.464op/s	1	200
normalization/normalize_service/normalize_service/test_ASCII	execution_time	49.046µs	49.166µs ± 0.047µs	49.163µs ± 0.030µs	49.194µs	49.246µs	49.271µs	49.405µs	0.49%	0.694	2.575	0.10%	0.003µs	1	200
normalization/normalize_service/normalize_service/test_ASCII	throughput	20240788.923op/s	20339298.774op/s ± 19509.150op/s	20340676.958op/s ± 12303.708op/s	20352090.027op/s	20364989.955op/s	20383088.972op/s	20388860.886op/s	0.24%	-0.682	2.528	0.10%	1379.505op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	[507.017µs; 507.311µs] or [-0.029%; +0.029%]	None	None	None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	[1971194.823op/s; 1972317.692op/s] or [-0.028%; +0.028%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	[468.863µs; 468.935µs] or [-0.008%; +0.008%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	[2132490.216op/s; 2132818.561op/s] or [-0.008%; +0.008%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	[180.642µs; 180.684µs] or [-0.012%; +0.012%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	[5534529.003op/s; 5535811.801op/s] or [-0.012%; +0.012%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	execution_time	[44.461µs; 44.477µs] or [-0.018%; +0.018%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	throughput	[22483375.309op/s; 22491659.937op/s] or [-0.018%; +0.018%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	execution_time	[49.159µs; 49.172µs] or [-0.013%; +0.013%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	throughput	[20336594.993op/s; 20342002.554op/s] or [-0.013%; +0.013%]	None	None	None

Group 8

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
concentrator/add_spans_to_concentrator	execution_time	8.910ms	8.943ms ± 0.019ms	8.941ms ± 0.010ms	8.952ms	8.966ms	8.994ms	9.113ms	1.93%	3.835	29.671	0.22%	0.001ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
concentrator/add_spans_to_concentrator	execution_time	[8.940ms; 8.946ms] or [-0.030%; +0.030%]	None	None	None

Group 9

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	270.685µs	271.746µs ± 0.476µs	271.746µs ± 0.249µs	271.935µs	272.571µs	273.149µs	274.243µs	0.92%	1.188	3.736	0.17%	0.034µs	1	200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	3646404.287op/s	3679913.627op/s ± 6431.444op/s	3679907.649op/s ± 3377.292op/s	3683922.970op/s	3688796.684op/s	3692206.660op/s	3694331.049op/s	0.39%	-1.165	3.631	0.17%	454.772op/s	1	200
normalization/normalize_name/normalize_name/bad-name	execution_time	26.573µs	26.621µs ± 0.037µs	26.624µs ± 0.031µs	26.641µs	26.690µs	26.724µs	26.742µs	0.44%	0.736	0.183	0.14%	0.003µs	1	200
normalization/normalize_name/normalize_name/bad-name	throughput	37394024.296op/s	37563703.802op/s ± 51773.677op/s	37560200.665op/s ± 43732.428op/s	37612736.997op/s	37625771.326op/s	37628905.132op/s	37631553.640op/s	0.19%	-0.729	0.165	0.14%	3660.952op/s	1	200
normalization/normalize_name/normalize_name/good	execution_time	16.115µs	16.158µs ± 0.046µs	16.153µs ± 0.033µs	16.185µs	16.266µs	16.296µs	16.323µs	1.05%	1.372	1.476	0.29%	0.003µs	1	200
normalization/normalize_name/normalize_name/good	throughput	61261759.347op/s	61889802.880op/s ± 177156.896op/s	61907973.047op/s ± 128308.949op/s	62036504.638op/s	62045048.910op/s	62047819.147op/s	62055450.905op/s	0.24%	-1.359	1.428	0.29%	12526.884op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	[271.680µs; 271.812µs] or [-0.024%; +0.024%]	None	None	None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	[3679022.290op/s; 3680804.963op/s] or [-0.024%; +0.024%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	execution_time	[26.616µs; 26.627µs] or [-0.019%; +0.019%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	throughput	[37556528.469op/s; 37570879.136op/s] or [-0.019%; +0.019%]	None	None	None
normalization/normalize_name/normalize_name/good	execution_time	[16.151µs; 16.164µs] or [-0.040%; +0.040%]	None	None	None
normalization/normalize_name/normalize_name/good	throughput	[61865250.638op/s; 61914355.122op/s] or [-0.040%; +0.040%]	None	None	None

Group 10

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching string interning on wordpress profile	execution_time	137.397µs	138.854µs ± 0.375µs	138.876µs ± 0.165µs	139.031µs	139.365µs	139.931µs	140.798µs	1.38%	0.273	5.627	0.27%	0.027µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching string interning on wordpress profile	execution_time	[138.802µs; 138.906µs] or [-0.037%; +0.037%]	None	None	None

Group 11

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
sql/obfuscate_sql_string	execution_time	75.242µs	75.448µs ± 0.144µs	75.439µs ± 0.041µs	75.479µs	75.549µs	75.693µs	77.194µs	2.33%	9.148	106.778	0.19%	0.010µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
sql/obfuscate_sql_string	execution_time	[75.428µs; 75.468µs] or [-0.026%; +0.026%]	None	None	None

Group 12

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`9c34541`	1728392591	glopes/flush-data-after-stop

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching deserializing traces from msgpack to their internal representation	execution_time	1.118µs	1.183µs ± 0.025µs	1.185µs ± 0.017µs	1.203µs	1.209µs	1.211µs	1.212µs	2.22%	-1.072	0.352	2.10%	0.002µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching deserializing traces from msgpack to their internal representation	execution_time	[1.180µs; 1.187µs] or [-0.292%; +0.292%]	None	None	None

Baseline

Omitted due to size.

bantonsson

These booleans are woefully undocumented and I'm not completely sure about the expected life cycle, but from reading the code it looks like your interpretation is correct.

I think that it might be better to leave the worker as started if it's restartable and leave the checks in place.

bantonsson · 2024-07-04T11:46:29Z

ddtelemetry/src/worker/mod.rs

@@ -296,7 +293,9 @@ impl TelemetryWorker {
                    self.log_err(&e);
                }
                self.data.started = false;
-                self.deadlines.clear_pending();
+                if !self.config.restartable {


Wouldn't it be enough to only include self.data.started = false; inside the if statement as well, and leave the exit early checks?

I guess it depends. Do we want Start, Stop, Stop to generate two stops? Because that's what PHP ends up generating. The second stop is a noop, but if I moved the assignment self.data.started = false under the condition if !self.config.restartable, then the stops would be effective

If the exit early checks are still in the code, then how would the second Stop be effective?

From what I understand, what you're proposing is that I reset started = false only if !restartable. In that case started would stay true forever once there is a start. So the early check

if !self.data.started { return BREAK; }

would never be hit.

Yes, you're right. I was confusing the early checks.

There is something in the logic that feels a bit broken. The FlushData is also protected by this !self.data.started check. Should it work after a restartable stop?

The things that Stop does, should they happen when the first request ends, or when the worker stops?

Should [FlushData] work after a restartable stop?

My guess is yes, otherwise there is no way to send the data that's collected after the stop (build_observability_batch is called only from the handlers of Stop and FlushData), at least not without an intervening start+stop.

The things that Stop does, should they happen when the first request ends, or when the worker stops?

That is a good point. The final flush of the metrics should happen when the worker stops, not when handling Stop. I guess at some point they were the same, but then the restart thing was introduced. But regardless, once we have a way to send metrics after a Stop, for that happen periodic flushes should still happen. So FlushData shouldn't be skipped or unscheduled after a Stop.

bantonsson · 2024-07-04T11:47:18Z

ddtelemetry/src/worker/mod.rs

@@ -458,7 +454,9 @@ impl TelemetryWorker {
                .await;

                self.data.started = false;
-                self.deadlines.clear_pending();
+                if !self.config.restartable {


codecov-commenter · 2024-07-04T15:46:14Z

Codecov Report

Attention: Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 70.24%. Comparing base (cc8ed56) to head (e0747f5).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #515      +/-   ##
==========================================
+ Coverage   70.10%   70.24%   +0.13%     
==========================================
  Files         214      214              
  Lines       28801    28805       +4     
==========================================
+ Hits        20192    20235      +43     
+ Misses       8609     8570      -39

Components	Coverage Δ
crashtracker	`21.20% <ø> (ø)`
datadog-alloc	`98.73% <ø> (ø)`
data-pipeline	`50.00% <ø> (ø)`
data-pipeline-ffi	`0.00% <ø> (ø)`
ddcommon	`83.07% <ø> (ø)`
ddcommon-ffi	`70.20% <ø> (ø)`
ddtelemetry	`59.01% <83.33%> (+0.05%)`	⬆️
ipc	`84.18% <ø> (ø)`
profiling	`84.26% <ø> (+0.69%)`	⬆️
profiling-ffi	`77.42% <ø> (ø)`
serverless	`0.00% <ø> (ø)`
sidecar	`34.55% <ø> (ø)`
sidecar-ffi	`0.00% <ø> (ø)`
spawn-worker	`54.98% <ø> (ø)`
trace-mini-agent	`70.88% <ø> (ø)`
trace-normalization	`98.24% <ø> (ø)`
trace-obfuscation	`95.73% <ø> (ø)`
trace-protobuf	`77.16% <ø> (ø)`
trace-utils	`90.90% <ø> (ø)`

Telemetry workers are functionally dead after a Stop lifecycle action, provided there's no intervening Start. While AddPoint actions are still processed, their data is never flushed, since the Stop action handler unschedules FlushMetrics and FlushData actions. PHP sends a Stop action at the end of every request via ddog_sidecar_telemetry_end(), but a Start action is only generated just after a telemetry worker is spawned. It is not clear to me whether the intention is to a Start/Stop pair on every PHP requests (where Stop flushes the metrics) or if the intention is to to have only such a pair in the first request, with the Stop event generated by ddog_sidecar_telemetry_end() effectively a noop. It would appear, judging by [this comment](#391): > Also allow the telemetry worker to have a mode where it's continuing execution after a start-stop cycle, otherwise it won't send any more metrics afterwards. that the intention is to keep sending metrics after a Start/Stop pair. In that case: * The Stop action handler should not unschedule FlushData and FlushMetrics events and * FlushData, if called outside a Start-Stop pair, should not be a noop. Finally: swap the order in which FlushData and FlushMetrics are scheduled so that FlushMetrics runs first and therefore its generated data can be sent by the next FlushData.

cataphract requested a review from a team as a code owner July 1, 2024 13:55

cataphract requested review from pawelchcki, bwoebi and bantonsson and removed request for a team and pawelchcki July 1, 2024 13:55

github-actions bot added the telemetry label Jul 1, 2024

bantonsson reviewed Jul 4, 2024

View reviewed changes

cataphract force-pushed the glopes/flush-data-after-stop branch from 4f63a34 to 251bafa Compare July 4, 2024 15:39

cataphract force-pushed the glopes/flush-data-after-stop branch 4 times, most recently from 447c409 to a7d11b0 Compare July 12, 2024 16:03

cataphract requested review from a team as code owners July 12, 2024 16:03

github-actions bot added the mini-agent label Jul 12, 2024

github-actions bot removed the mini-agent label Aug 1, 2024

cataphract force-pushed the glopes/flush-data-after-stop branch from e0747f5 to 9c34541 Compare October 10, 2024 12:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

telemetry worker: flush data after stops #515

telemetry worker: flush data after stops #515

cataphract commented Jul 1, 2024 •

edited

Loading

pr-commenter bot commented Jul 1, 2024 •

edited

Loading

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

Group 8

Group 9

Group 10

Group 11

Group 12

bantonsson left a comment

bantonsson Jul 4, 2024

cataphract Jul 4, 2024

bantonsson Jul 4, 2024

cataphract Jul 4, 2024 •

edited

Loading

bantonsson Jul 4, 2024

cataphract Jul 4, 2024

bantonsson Jul 4, 2024

codecov-commenter commented Jul 4, 2024 •

edited

Loading

telemetry worker: flush data after stops #515

Are you sure you want to change the base?

telemetry worker: flush data after stops #515

Conversation

cataphract commented Jul 1, 2024 • edited Loading

pr-commenter bot commented Jul 1, 2024 • edited Loading

Benchmarks

Comparison

scenario:benching deserializing traces from msgpack to their internal representation

Candidate

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

Group 8

Group 9

Group 10

Group 11

Group 12

Baseline

bantonsson left a comment

Choose a reason for hiding this comment

bantonsson Jul 4, 2024

Choose a reason for hiding this comment

cataphract Jul 4, 2024

Choose a reason for hiding this comment

bantonsson Jul 4, 2024

Choose a reason for hiding this comment

cataphract Jul 4, 2024 • edited Loading

Choose a reason for hiding this comment

bantonsson Jul 4, 2024

Choose a reason for hiding this comment

cataphract Jul 4, 2024

Choose a reason for hiding this comment

bantonsson Jul 4, 2024

Choose a reason for hiding this comment

codecov-commenter commented Jul 4, 2024 • edited Loading

Codecov Report

cataphract commented Jul 1, 2024 •

edited

Loading

pr-commenter bot commented Jul 1, 2024 •

edited

Loading

cataphract Jul 4, 2024 •

edited

Loading

codecov-commenter commented Jul 4, 2024 •

edited

Loading