Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Record shard allocation #1258

Merged
merged 11 commits into from
May 18, 2021
Merged

Record shard allocation #1258

merged 11 commits into from
May 18, 2021

Conversation

ebadyano
Copy link
Contributor

Relates to #1242

@ebadyano ebadyano added enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics labels Apr 29, 2021
@ebadyano ebadyano self-assigned this Apr 29, 2021
@ebadyano
Copy link
Contributor Author

ebadyano commented Apr 29, 2021

@gingerwizard I started with _cat/shards API for now, it gives you storage and docs count. do you think it's sufficient or would you prefer segments? I could use node stats api with ?level-shards, which will have segement info as well, but then it gives node id instead of of node name.. not sure if that's important. Thoughts?

@gingerwizard
Copy link
Contributor

@ebadyano could you provide some sample docs here. We defn need the node name for each doc. Shard level analysis is sufficient although a segment count will be better.

@ebadyano
Copy link
Contributor Author

ebadyano commented May 3, 2021

For _cat/shards reposne "geonames 4 p STARTED 211237 63mb 127.0.0.1 rally-node-0"

The sample doc would like like this:

{'name': 'shard-stats', 
'shard-id': '4', 
'index': 'geonames', 
'prirep': 'p', 
'docs': '211237', 
'store': '63mb', 
'node': 'rally-node-0'} 

metadata:

{'cluster': 'default'}

@gingerwizard I wonder if we want to have store always in bytes? And for primary/replica instead of p vs r could be True/False

@gingerwizard
Copy link
Contributor

So i think we need:

  1. A count of segments if possible
  2. Yes re primary: true vs p/r. Easier to filter.
  3. And yes we need to convert all units to bytes so we can perform math at agg time + chart.

@danielmitterdorfer danielmitterdorfer added this to the 2.2.1 milestone May 4, 2021
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments: we should avoid the cat API, also user documentation is missing.

esrally/telemetry.py Outdated Show resolved Hide resolved
esrally/telemetry.py Outdated Show resolved Hide resolved
esrally/telemetry.py Outdated Show resolved Hide resolved
try:
stats = self.client.cat.shards(index=self.indices)
except elasticsearch.TransportError:
msg = "A transport error occurred while collecting _cat/shards on cluster [{}]".format(self.cluster_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use f-strings for all new code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered using https://github.com/asottile/pyupgrade as part of make lint to enforce f-strings everywhere? There are ways to avoid ruining git blame if that's a concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the idea; I wasn't aware of that. pyupgrade could really be a nice approach to migrate to f-strings across the code base. We already discussed a while back using black for automatic formatting in another tool that we maintain but did not get around to introducing it yet so mid-term this could indeed be an option.

esrally/telemetry.py Outdated Show resolved Hide resolved
@ebadyano
Copy link
Contributor Author

ebadyano commented May 13, 2021

@gingerwizard Example of a doc pushed to the metric store:

            {
                "name": "shard-stats",
                "shard-id": "0",
                "index": "geonames",
                "primary": True,
                "docs": 1000,
                "store": 212027,
                "node": "rally0"
            }
Just noticed I forgot to segment count. Will add it shortly, it's pretty straight forward

@gingerwizard
Copy link
Contributor

The structure LGTM @ebadyano and fullfils the requirements thanks. I will now review the code.

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for iterating. I left a couple more comments. Can you please ensure we have user docs?


for index_name, stats in shard_stats.items():
for curr_shard in stats:
for shard_id, curr_stats in curr_shard.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this more robust in case of missing properties? A benchmark should not fail because a property wasn't contained in the response.

for cluster_name in self.indices_per_cluster.keys():
if cluster_name not in clients:
raise exceptions.SystemSetupError(
f"The telemetry parameter 'shard-stats-transforms' must be a JSON Object with keys "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter is called shard-stats-indices, can you please align the names? Also, we should add user docs.

Copy link
Contributor

@gingerwizard gingerwizard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ebadyano this largely looks good. A few comments re exceptions and I'm not clear how we support no shard-stats-indices. I think we need a better default value. Also we need docs!

esrally/telemetry.py Outdated Show resolved Hide resolved
esrally/telemetry.py Show resolved Hide resolved
May optionally specify:
``shard-stats-indices``: JSON structure specifying the index pattern per cluster to publish stats from.
Not all clusters need to be specified, but any name used must be be present in target.hosts. Alternatively,
the index pattern can be specified as a string can be specified in case only one cluster is involved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This last sentence doesn't quite make sense, maybe
"Alternatively, the index pattern can be specified as a string in the event only one cluster is involved."

f"The telemetry parameter 'shard-stats-sample-interval' must be greater than zero but was {self.sample_interval}.")

self.specified_cluster_names = self.clients.keys()
indices_per_cluster = self.telemetry_params.get("shard-stats-indices", False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is False a valid value here? Shouldn't this be a * by default i.e. we monitor all indices?
Its not clear what happens if nothing is specified for shard-stats-indices given specified_cluster_names will not be set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe {opts.TargetHosts.DEFAULT: "*"} as the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, this was leftover from when i used _cat api and there was a way to pass index parameter to the query to only get stats for specific indices. Looks like nodes api in elasticsearch py client doesn't allow that. I can still keep and process on rally side if we think it could be useful. thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't expect that there are many indices in the cluster that are not created because of the benchmark so I'd opt for simplicity and remove the parameter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed any non-benchmark indices will be negligible. I'd like to filter out internal and system indices but I think we can do that in our dashboarding.

esrally/telemetry.py Show resolved Hide resolved
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the docs. I left a couple of small comments but I think we're almost there now.

"name": "shard-stats",
"shard-id": "0",
"index": "geonames",
"primary": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should show example documents in JSON, rather than Python dict format. Can you please lower-case this?

"primary": True,
"docs": 1000,
"store": 212027,
"segments_count": 8,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mix hyphens (see shard-id) and underscores here but it would be nicer to read if this is consistent. Can you please use hyphens everywhere?

shard-stats
--------------

The shard-stats telemetry device regularly calls the `cluster node-stats API with level=shard parameter <https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html>`_ and records one metrics document per shard.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "node-stats" -> "nodes-stats"


The shard-stats telemetry device regularly calls the `cluster node-stats API with level=shard parameter <https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html>`_ and records one metrics document per shard.

Example of the recorded document::
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "the" -> "a"

"node": node_name
}
self.metrics_store.put_doc(doc, level=MetaInfoScope.cluster, meta_data=shard_metadata)
print(f"shards {doc}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, thank you for catching

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove the print statement?

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we get rid of a leftover I think we're good. :)

"node": node_name
}
self.metrics_store.put_doc(doc, level=MetaInfoScope.cluster, meta_data=shard_metadata)
print(f"shards {doc}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please remove the print statement?

@ebadyano
Copy link
Contributor Author

that's one stubborn print!

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the print now gone, LGTM :)

@ebadyano ebadyano merged commit 1de6ff8 into elastic:master May 18, 2021
ebadyano added a commit to ebadyano/rally that referenced this pull request May 19, 2021
ebadyano added a commit that referenced this pull request May 19, 2021
@ebadyano ebadyano deleted the shards-info branch December 16, 2022 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants