Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display ZGC collector stats in report #1502

Merged
merged 11 commits into from
Jun 2, 2022

Conversation

j-bennet
Copy link
Contributor

@j-bennet j-bennet commented May 26, 2022

Following up in #1499.

This adds ZGC stats to the race report and comparison report.

I'm not very fond of all the hardcoded metric names, this can probably be improved.

Related PR: elastic/rally-teams#74.

@j-bennet j-bennet marked this pull request as draft May 26, 2022 00:22
@j-bennet j-bennet marked this pull request as ready for review May 26, 2022 18:27
@j-bennet j-bennet marked this pull request as draft May 26, 2022 20:59
@j-bennet j-bennet marked this pull request as ready for review May 26, 2022 22:27
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thank you! Have you been able to run scripts/analyze.py? It fails here on something unrelated:

$ python scripts/analyze.py ~/.rally/benchmarks/races/da9e072b-ebe0-4380-817c-897901fb90d4/race.json
Traceback (most recent call last):
  File "scripts/analyze.py", line 269, in <module>
    main()
  File "scripts/analyze.py", line 265, in main
    plot(series, args.label)
  File "scripts/analyze.py", line 235, in plot
    plot_service_time(raw_data, label_key)
  File "scripts/analyze.py", line 89, in plot_service_time
    "percentiles": [decode_percentile_key(p) for p in service_time_metrics.keys()],
  File "scripts/analyze.py", line 89, in <listcomp>
    "percentiles": [decode_percentile_key(p) for p in service_time_metrics.keys()],
  File "scripts/analyze.py", line 58, in decode_percentile_key
    return float(k.replace("_", "."))
ValueError: could not convert string to float: 'mean'

Which is expected as it's not tested and the last real update was in October 2019.

@j-bennet
Copy link
Contributor Author

Interesting, I thought I fixed the analyze script. it passed on my race file. Could you share your race file?

@pquentin
Copy link
Member

Yes, sorry I should have included it. My invocation is esrally race --track=http_logs --test-mode --distribution-version=8.2.1 --car=4gheap,zgc --team-path=$HOME/src/rally-teams and the resulting race.json file is:

{
 "rally-version": "2.5.0.dev0 (git revision: 3cfb05e)",
 "rally-revision": "3cfb05e",
 "environment": "local",
 "race-id": "da9e072b-ebe0-4380-817c-897901fb90d4",
 "race-timestamp": "20220531T133153Z",
 "pipeline": "from-distribution",
 "user-tags": {},
 "track": "http_logs",
 "car": [
  "4gheap",
  "zgc"
 ],
 "cluster": {
  "revision": "db223507a0bd08f8e84a93e329764cc39b0043b9",
  "distribution-version": "8.2.1",
  "distribution-flavor": "default",
  "team-revision": null
 },
 "results": {
  "op_metrics": [
   {
    "task": "index-append",
    "operation": "index-append",
    "throughput": {
     "min": 13876.293571182807,
     "mean": 13876.293571182807,
     "median": 13876.293571182807,
     "max": 13876.293571182807,
     "unit": "docs/s"
    },
    "latency": {
     "50_0": 41.259046498453245,
     "90_0": 90.03284649224952,
     "100_0": 141.25881501240656,
     "mean": 47.513707515463466,
     "unit": "ms"
    },
    "service_time": {
     "50_0": 41.259046498453245,
     "90_0": 90.03284649224952,
     "100_0": 141.25881501240656,
     "mean": 47.513707515463466,
     "unit": "ms"
    },
    "processing_time": {
     "50_0": 49.69329600862693,
     "90_0": 105.61709399917163,
     "100_0": 157.76885402738117,
     "mean": 56.82962951790874,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 411.33441700367257
   },
   {
    "task": "default",
    "operation": "default",
    "throughput": {
     "min": 51.23815341013206,
     "mean": 51.23815341013206,
     "median": 51.23815341013206,
     "max": 51.23815341013206,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 27.639028994599357,
     "mean": 27.639028994599357,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 7.870311994338408,
     "mean": 7.870311994338408,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 8.21029898361303,
     "mean": 8.21029898361303,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 22.10435099550523
   },
   {
    "task": "term",
    "operation": "term",
    "throughput": {
     "min": 76.28850753462655,
     "mean": 76.28850753462655,
     "median": 76.28850753462655,
     "max": 76.28850753462655,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 21.60522897611372,
     "mean": 21.60522897611372,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 8.040532993618399,
     "mean": 8.040532993618399,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 8.639666979433969,
     "mean": 8.639666979433969,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 16.60796400392428
   },
   {
    "task": "terms_enum",
    "operation": "terms_enum",
    "throughput": {
     "min": 26.303833032309853,
     "mean": 26.303833032309853,
     "median": 26.303833032309853,
     "max": 26.303833032309853,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 64.07701599528082,
     "mean": 64.07701599528082,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 25.792030995944515,
     "mean": 25.792030995944515,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 26.063408004119992,
     "mean": 26.063408004119992,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 40.80254698055796
   },
   {
    "task": "range",
    "operation": "range",
    "throughput": {
     "min": 86.64843201256042,
     "mean": 86.64843201256042,
     "median": 86.64843201256042,
     "max": 86.64843201256042,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 19.153380999341607,
     "mean": 19.153380999341607,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 6.965659005800262,
     "mean": 6.965659005800262,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 7.622809003805742,
     "mean": 7.622809003805742,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 14.915320003638044
   },
   {
    "task": "200s-in-range",
    "operation": "200s-in-range",
    "throughput": {
     "min": 86.03082855589855,
     "mean": 86.03082855589855,
     "median": 86.03082855589855,
     "max": 86.03082855589855,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 17.148855986306444,
     "mean": 17.148855986306444,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 5.197281978325918,
     "mean": 5.197281978325918,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 5.5664340034127235,
     "mean": 5.5664340034127235,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 15.719990013167262
   },
   {
    "task": "400s-in-range",
    "operation": "400s-in-range",
    "throughput": {
     "min": 120.03960962765805,
     "mean": 120.03960962765805,
     "median": 120.03960962765805,
     "max": 120.03960962765805,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 14.47606598958373,
     "mean": 14.47606598958373,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 5.6747299968265,
     "mean": 5.6747299968265,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 6.160617980640382,
     "mean": 6.160617980640382,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 12.089483992895111
   },
   {
    "task": "hourly_agg",
    "operation": "hourly_agg",
    "throughput": {
     "min": 14.824214663386773,
     "mean": 14.824214663386773,
     "median": 14.824214663386773,
     "max": 14.824214663386773,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 81.01819199509919,
     "mean": 81.01819199509919,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 13.307387998793274,
     "mean": 13.307387998793274,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 13.555782003095374,
     "mean": 13.555782003095374,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 70.53841598099098
   },
   {
    "task": "scroll",
    "operation": "scroll",
    "throughput": {
     "min": 29.68024052414308,
     "mean": 29.68024052414308,
     "median": 29.68024052414308,
     "max": 29.68024052414308,
     "unit": "pages/s"
    },
    "latency": {
     "100_0": 566.3801409828011,
     "mean": 566.3801409828011,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 295.79545799060725,
     "mean": 295.79545799060725,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 296.62710198317654,
     "mean": 296.62710198317654,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 281.21734599699266
   },
   {
    "task": "desc_sort_timestamp",
    "operation": "desc_sort_timestamp",
    "throughput": {
     "min": 22.92276288355545,
     "mean": 22.92276288355545,
     "median": 22.92276288355545,
     "max": 22.92276288355545,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 53.968891996191815,
     "mean": 53.968891996191815,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 10.05093997810036,
     "mean": 10.05093997810036,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 10.309220990166068,
     "mean": 10.309220990166068,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 54.11532800644636
   },
   {
    "task": "asc_sort_timestamp",
    "operation": "asc_sort_timestamp",
    "throughput": {
     "min": 91.94909625017812,
     "mean": 91.94909625017812,
     "median": 91.94909625017812,
     "max": 91.94909625017812,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 19.669803994474933,
     "mean": 19.669803994474933,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 8.43160500517115,
     "mean": 8.43160500517115,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 8.792451990302652,
     "mean": 8.792451990302652,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 14.317564986413345
   },
   {
    "task": "desc_sort_with_after_timestamp",
    "operation": "desc_sort_with_after_timestamp",
    "throughput": {
     "min": 97.63732017319242,
     "mean": 97.63732017319242,
     "median": 97.63732017319242,
     "max": 97.63732017319242,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 17.249081982299685,
     "mean": 17.249081982299685,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 6.75946197588928,
     "mean": 6.75946197588928,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 7.007138017797843,
     "mean": 7.007138017797843,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 13.141165021806955
   },
   {
    "task": "asc_sort_with_after_timestamp",
    "operation": "asc_sort_with_after_timestamp",
    "throughput": {
     "min": 101.68625977331959,
     "mean": 101.68625977331959,
     "median": 101.68625977331959,
     "max": 101.68625977331959,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 17.395920993294567,
     "mean": 17.395920993294567,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 7.3115739796776325,
     "mean": 7.3115739796776325,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 7.558141980553046,
     "mean": 7.558141980553046,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 12.949696014402434
   },
   {
    "task": "desc-sort-timestamp-after-force-merge-1-seg",
    "operation": "desc_sort_timestamp",
    "throughput": {
     "min": 62.707312332740294,
     "mean": 62.707312332740294,
     "median": 62.707312332740294,
     "max": 62.707312332740294,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 30.55372298695147,
     "mean": 30.55372298695147,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 14.189499983331189,
     "mean": 14.189499983331189,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 14.6956889948342,
     "mean": 14.6956889948342,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 19.727142003830522
   },
   {
    "task": "asc-sort-timestamp-after-force-merge-1-seg",
    "operation": "asc_sort_timestamp",
    "throughput": {
     "min": 72.07141322931791,
     "mean": 72.07141322931791,
     "median": 72.07141322931791,
     "max": 72.07141322931791,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 27.752899011829868,
     "mean": 27.752899011829868,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 13.544212008127943,
     "mean": 13.544212008127943,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 13.849948009010404,
     "mean": 13.849948009010404,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 17.674290982540697
   },
   {
    "task": "desc-sort-with-after-timestamp-after-force-merge-1-seg",
    "operation": "desc_sort_with_after_timestamp",
    "throughput": {
     "min": 87.14440947008653,
     "mean": 87.14440947008653,
     "median": 87.14440947008653,
     "max": 87.14440947008653,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 23.33819499472156,
     "mean": 23.33819499472156,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 10.797277005622163,
     "mean": 10.797277005622163,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 11.519136984134093,
     "mean": 11.519136984134093,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 18.578582006739452
   },
   {
    "task": "asc-sort-with-after-timestamp-after-force-merge-1-seg",
    "operation": "asc_sort_with_after_timestamp",
    "throughput": {
     "min": 74.37962068078843,
     "mean": 74.37962068078843,
     "median": 74.37962068078843,
     "max": 74.37962068078843,
     "unit": "ops/s"
    },
    "latency": {
     "100_0": 21.507012977963313,
     "mean": 21.507012977963313,
     "unit": "ms"
    },
    "service_time": {
     "100_0": 7.555655989563093,
     "mean": 7.555655989563093,
     "unit": "ms"
    },
    "processing_time": {
     "100_0": 8.044564019655809,
     "mean": 8.044564019655809,
     "unit": "ms"
    },
    "error_rate": 0.0,
    "duration": 17.142892989795655
   }
  ],
  "total_time": 1858,
  "total_time_per_shard": {
   "min": 0,
   "median": 40.5,
   "max": 148,
   "unit": "ms"
  },
  "indexing_throttle_time": 0,
  "indexing_throttle_time_per_shard": {
   "min": 0,
   "median": 0.0,
   "max": 0,
   "unit": "ms"
  },
  "merge_time": 207,
  "merge_time_per_shard": {
   "min": 0,
   "median": 5.0,
   "max": 41,
   "unit": "ms"
  },
  "merge_count": 30,
  "refresh_time": 743,
  "refresh_time_per_shard": {
   "min": 0,
   "median": 19.0,
   "max": 47,
   "unit": "ms"
  },
  "refresh_count": 357,
  "flush_time": 0,
  "flush_time_per_shard": {
   "min": 0,
   "median": 0.0,
   "max": 0,
   "unit": "ms"
  },
  "flush_count": 0,
  "merge_throttle_time": 0,
  "merge_throttle_time_per_shard": {
   "min": 0,
   "median": 0.0,
   "max": 0,
   "unit": "ms"
  },
  "ml_processing_time": [],
  "young_gc_time": null,
  "young_gc_count": null,
  "old_gc_time": null,
  "old_gc_count": null,
  "zgc_cycles_gc_time": 117,
  "zgc_cycles_gc_count": 1,
  "zgc_pauses_gc_time": 0,
  "zgc_pauses_gc_count": 3,
  "memory_segments": 0,
  "memory_doc_values": 0,
  "memory_terms": 0,
  "memory_norms": 0,
  "memory_points": 0,
  "memory_stored_fields": 0,
  "store_size": 1111303,
  "translog_size": 2200,
  "segment_count": 35,
  "total_transform_search_times": [],
  "total_transform_index_times": [],
  "total_transform_processing_times": [],
  "total_transform_throughput": [],
  "ingest_pipeline_cluster_count": 0,
  "ingest_pipeline_cluster_time": 0,
  "ingest_pipeline_cluster_failed": 0,
  "disk_usage_total": [],
  "disk_usage_inverted_index": [],
  "disk_usage_stored_fields": [],
  "disk_usage_doc_values": [],
  "disk_usage_points": [],
  "disk_usage_norms": [],
  "disk_usage_term_vectors": []
 },
 "track-revision": "ebd1ab6",
 "challenge": "append-no-conflicts"
}

@j-bennet
Copy link
Contributor Author

@pquentin this commit fixes the problem. I'm guessing the analyze script is not used much? Perhaps it should be deleted, instead of attempting to support it?

@pquentin
Copy link
Member

pquentin commented Jun 1, 2022

I'm guessing the analyze script is not used much? Perhaps it should be deleted, instead of attempting to support it?

Thanks for the suggestion! Discussed this internally and we agreed to remove analyze.py, done in #1507. Can you please revert your last commit that introduces formatting changes and merge from master when #1507 is in?

Also, sorry that I forgot: can you please document those new values in docs/metrics.rst and docs/summary_report.rst? Search for young_gen in docs to find the relevant parts.

We're getting there!

@j-bennet
Copy link
Contributor Author

j-bennet commented Jun 1, 2022

Can you please revert your last commit that introduces formatting changes

Oh man, I ran black . and that did it, and I didn't notice the extra files. So the codebase is not all "blackened"?

@pquentin
Copy link
Member

pquentin commented Jun 1, 2022

Oh man, I ran black . and that did it, and I didn't notice the extra files. So the codebase is not all "blackened"?

Yeah, two files are missing (setup.py and docs/conf.py). I'm using make format and my editor to run black, so I had not noticed it.

Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Marking as "changes requested" for the docs)

@j-bennet j-bennet requested a review from pquentin June 1, 2022 17:32
@j-bennet j-bennet force-pushed the j-bennet/1052-display-zgc-stats branch from b9c0fa6 to 634aca3 Compare June 1, 2022 18:03
Copy link
Member

@pquentin pquentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating! LGTM.

@j-bennet j-bennet merged commit ecaa57e into elastic:master Jun 2, 2022
@j-bennet j-bennet deleted the j-bennet/1052-display-zgc-stats branch June 2, 2022 16:15
@pquentin pquentin added the enhancement Improves the status quo label Jun 23, 2022
@pquentin pquentin added this to the 2.5.0 milestone Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants