-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Record shard allocation #1258
Record shard allocation #1258
Conversation
Relates to elastic#1242
@gingerwizard I started with _cat/shards API for now, it gives you storage and docs count. do you think it's sufficient or would you prefer segments? I could use node stats api with ?level-shards, which will have segement info as well, but then it gives node id instead of of node name.. not sure if that's important. Thoughts? |
@ebadyano could you provide some sample docs here. We defn need the node name for each doc. Shard level analysis is sufficient although a segment count will be better. |
For _cat/shards reposne The sample doc would like like this:
metadata:
@gingerwizard I wonder if we want to have store always in bytes? And for primary/replica instead of |
So i think we need:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments: we should avoid the cat API, also user documentation is missing.
esrally/telemetry.py
Outdated
try: | ||
stats = self.client.cat.shards(index=self.indices) | ||
except elasticsearch.TransportError: | ||
msg = "A transport error occurred while collecting _cat/shards on cluster [{}]".format(self.cluster_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use f-strings for all new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered using https://github.com/asottile/pyupgrade as part of make lint
to enforce f-strings everywhere? There are ways to avoid ruining git blame if that's a concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for the idea; I wasn't aware of that. pyupgrade could really be a nice approach to migrate to f-strings across the code base. We already discussed a while back using black for automatic formatting in another tool that we maintain but did not get around to introducing it yet so mid-term this could indeed be an option.
@gingerwizard Example of a doc pushed to the metric store:
|
The structure LGTM @ebadyano and fullfils the requirements thanks. I will now review the code. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating. I left a couple more comments. Can you please ensure we have user docs?
|
||
for index_name, stats in shard_stats.items(): | ||
for curr_shard in stats: | ||
for shard_id, curr_stats in curr_shard.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make this more robust in case of missing properties? A benchmark should not fail because a property wasn't contained in the response.
esrally/telemetry.py
Outdated
for cluster_name in self.indices_per_cluster.keys(): | ||
if cluster_name not in clients: | ||
raise exceptions.SystemSetupError( | ||
f"The telemetry parameter 'shard-stats-transforms' must be a JSON Object with keys " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter is called shard-stats-indices
, can you please align the names? Also, we should add user docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @ebadyano this largely looks good. A few comments re exceptions and I'm not clear how we support no shard-stats-indices
. I think we need a better default value. Also we need docs!
esrally/telemetry.py
Outdated
May optionally specify: | ||
``shard-stats-indices``: JSON structure specifying the index pattern per cluster to publish stats from. | ||
Not all clusters need to be specified, but any name used must be be present in target.hosts. Alternatively, | ||
the index pattern can be specified as a string can be specified in case only one cluster is involved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This last sentence doesn't quite make sense, maybe
"Alternatively, the index pattern can be specified as a string in the event only one cluster is involved."
esrally/telemetry.py
Outdated
f"The telemetry parameter 'shard-stats-sample-interval' must be greater than zero but was {self.sample_interval}.") | ||
|
||
self.specified_cluster_names = self.clients.keys() | ||
indices_per_cluster = self.telemetry_params.get("shard-stats-indices", False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is False
a valid value here? Shouldn't this be a * by default i.e. we monitor all indices?
Its not clear what happens if nothing is specified for shard-stats-indices
given specified_cluster_names
will not be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe {opts.TargetHosts.DEFAULT: "*"}
as the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this was leftover from when i used _cat api and there was a way to pass index parameter to the query to only get stats for specific indices. Looks like nodes api in elasticsearch py client doesn't allow that. I can still keep and process on rally side if we think it could be useful. thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't expect that there are many indices in the cluster that are not created because of the benchmark so I'd opt for simplicity and remove the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed any non-benchmark indices will be negligible. I'd like to filter out internal and system indices but I think we can do that in our dashboarding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the docs. I left a couple of small comments but I think we're almost there now.
docs/telemetry.rst
Outdated
"name": "shard-stats", | ||
"shard-id": "0", | ||
"index": "geonames", | ||
"primary": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should show example documents in JSON, rather than Python dict format. Can you please lower-case this?
docs/telemetry.rst
Outdated
"primary": True, | ||
"docs": 1000, | ||
"store": 212027, | ||
"segments_count": 8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We mix hyphens (see shard-id
) and underscores here but it would be nicer to read if this is consistent. Can you please use hyphens everywhere?
docs/telemetry.rst
Outdated
shard-stats | ||
-------------- | ||
|
||
The shard-stats telemetry device regularly calls the `cluster node-stats API with level=shard parameter <https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html>`_ and records one metrics document per shard. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "node-stats" -> "nodes-stats"
docs/telemetry.rst
Outdated
|
||
The shard-stats telemetry device regularly calls the `cluster node-stats API with level=shard parameter <https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html>`_ and records one metrics document per shard. | ||
|
||
Example of the recorded document:: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "the" -> "a"
esrally/telemetry.py
Outdated
"node": node_name | ||
} | ||
self.metrics_store.put_doc(doc, level=MetaInfoScope.cluster, meta_data=shard_metadata) | ||
print(f"shards {doc}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leftover?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, thank you for catching
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove the print statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we get rid of a leftover I think we're good. :)
esrally/telemetry.py
Outdated
"node": node_name | ||
} | ||
self.metrics_store.put_doc(doc, level=MetaInfoScope.cluster, meta_data=shard_metadata) | ||
print(f"shards {doc}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please remove the print statement?
that's one stubborn print! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the print
now gone, LGTM :)
Relates to elastic#1258
Relates to #1242