Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add refresh/merge/flush totals in summary #615

Merged
merged 3 commits into from
Dec 14, 2018

Conversation

dliappis
Copy link
Contributor

Due to #608 it's likely we need to benchmark scenarios without using
the node-stats telemetry device. At the same time we want to get a
general idea of how many refreshes/merges/flushes happened (in total)
by accessing the index stats.

Add total count for merges/refresh/flush in summary output; this is
collected from _all/primaries in _stats.

Also modify summary description to clarify the values are totals from
primary shards.

Finally fix bug where index stats where time/count == 0 got skipped
from the summary.

Closes #614

Due to elastic#608 it's likely we need to benchmark scenarios without using
the node-stats telemetry device. At the same time we want to get a
general idea of how many refreshes/merges/flushes happened (in total)
by accessing the index stats.

Add total count for merges/refresh/flush in summary output; this is
collected from `_all/primaries` in `_stats`.

Also modify summary description to clarify the values are totals from
primary shards.

Finally fix bug where index stats where time/count == 0 got skipped
from the summary.

Closes elastic#614
@dliappis dliappis added enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics labels Dec 10, 2018
@dliappis dliappis added this to the 1.0.3 milestone Dec 10, 2018
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. I left a couple of comments and suggestions.

self.line("Total Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds),
self.line("Total Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds)
self.line("Total primaries Young Gen GC", "", stats.young_gc_time, "s", convert.ms_to_seconds),
self.line("Total primaries Old Gen GC", "", stats.old_gc_time, "s", convert.ms_to_seconds)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct as GC times are gathered on node-level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch thanks, will fix.

return lines

def report_total_time(self, name, baseline_total, baseline_per_shard, contender_total, contender_per_shard):
unit = "min"
return self.join(
self.line("Total {}".format(name), baseline_total, contender_total, "", unit,
self.line("Total primaries {}".format(name), baseline_total, contender_total, "", unit,
treat_increase_as_improvement=False, formatter=convert.ms_to_minutes),
self.line("Min {} per shard".format(name), baseline_per_shard.get("min"), contender_per_shard.get("min"), "", unit,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also say here: "... per primary shard"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As there can be >1 primary shards, how about "... across primary shards"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or "... of primary shards"?

treat_increase_as_improvement=False, formatter=convert.ms_to_seconds),
self.line("Total Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s",
self.line("Total primaries Old Gen GC", baseline_stats.old_gc_time, contender_stats.old_gc_time, "", "s",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I don't think that the GC times should mention primaries.

doc = {
"name": name,
"value": value,
"unit": "",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should specify unit at all for a count?

counts = []
self.index_count(counts, stats, "merges_total_count", ["merges", "total"])
self.index_count(counts, stats, "refresh_total_count", ["refresh", "total"])
self.index_count(counts, stats, "flush_total_count", ["flush", "total"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these new metric keys should be documented in https://esrally.readthedocs.io/en/stable/metrics.html#metric-keys. Also, the corresponding docs for the summary report should be amended.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will address.

self.line("Min {} per shard".format(name), "", total_time_per_shard.get("min"), unit, convert.ms_to_minutes),
self.line("Median {} per shard".format(name), "", total_time_per_shard.get("median"), unit, convert.ms_to_minutes),
self.line("Max {} per shard".format(name), "", total_time_per_shard.get("max"), unit, convert.ms_to_minutes),
)

def report_total_counts(self, stats):
lines = []
lines.extend(self.report_total_count("merge count", stats.merge_count))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer if the output is grouped by action (merge, refresh, flush) instead of all times together and all counts together.

@@ -771,12 +790,18 @@ def report_total_times(self, baseline_stats, contender_stats):
lines.extend(self.report_total_time("flush time",
baseline_stats.flush_time, baseline_stats.flush_time_per_shard,
contender_stats.flush_time, contender_stats.flush_time_per_shard))
lines.extend(self.report_total_count("merge count",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nicer if the output is grouped by action (merge, refresh, flush) instead of all times together and all counts together.

And group total counts directly below total times in all summary reports.
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating. LGTM

Use cumulative instead of total for summary reports and comparisons
and clarify that min/median/max are across primary shards.

Relates: elastic/elasticsearch#35594 (comment)
@dliappis
Copy link
Contributor Author

@danielmitterdorfer in 3d12203 I switched to the new "cumulative" terminology from "totals" and made the clarifications for "across primary shards" for min/median/max metrics; would be appreciated if you could take a quick look before merge.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think it's really clearer now (albeit the command line report becoming really large by now). LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add total merges and refreshes in summary output
2 participants