Record shard allocation #1258

ebadyano · 2021-04-29T14:38:01Z

Relates to #1242

ebadyano · 2021-04-29T14:40:38Z

@gingerwizard I started with _cat/shards API for now, it gives you storage and docs count. do you think it's sufficient or would you prefer segments? I could use node stats api with ?level-shards, which will have segement info as well, but then it gives node id instead of of node name.. not sure if that's important. Thoughts?

gingerwizard · 2021-04-29T19:48:35Z

@ebadyano could you provide some sample docs here. We defn need the node name for each doc. Shard level analysis is sufficient although a segment count will be better.

ebadyano · 2021-05-03T12:58:16Z

For _cat/shards reposne "geonames 4 p STARTED 211237 63mb 127.0.0.1 rally-node-0"

The sample doc would like like this:

{'name': 'shard-stats', 
'shard-id': '4', 
'index': 'geonames', 
'prirep': 'p', 
'docs': '211237', 
'store': '63mb', 
'node': 'rally-node-0'}

metadata:

{'cluster': 'default'}

@gingerwizard I wonder if we want to have store always in bytes? And for primary/replica instead of p vs r could be True/False

gingerwizard · 2021-05-03T13:04:18Z

So i think we need:

A count of segments if possible
Yes re primary: true vs p/r. Easier to filter.
And yes we need to convert all units to bytes so we can perform math at agg time + chart.

danielmitterdorfer

I left some comments: we should avoid the cat API, also user documentation is missing.

esrally/telemetry.py

danielmitterdorfer · 2021-05-04T10:08:36Z

esrally/telemetry.py

+        try:
+            stats = self.client.cat.shards(index=self.indices)
+        except elasticsearch.TransportError:
+            msg = "A transport error occurred while collecting _cat/shards on cluster [{}]".format(self.cluster_name)


We should use f-strings for all new code.

Have you considered using https://github.com/asottile/pyupgrade as part of make lint to enforce f-strings everywhere? There are ways to avoid ruining git blame if that's a concern.

Ah, thanks for the idea; I wasn't aware of that. pyupgrade could really be a nice approach to migrate to f-strings across the code base. We already discussed a while back using black for automatic formatting in another tool that we maintain but did not get around to introducing it yet so mid-term this could indeed be an option.

esrally/telemetry.py

ebadyano · 2021-05-13T12:07:35Z

@gingerwizard Example of a doc pushed to the metric store:

            {
                "name": "shard-stats",
                "shard-id": "0",
                "index": "geonames",
                "primary": True,
                "docs": 1000,
                "store": 212027,
                "node": "rally0"
            }
Just noticed I forgot to segment count. Will add it shortly, it's pretty straight forward

gingerwizard · 2021-05-14T09:51:43Z

The structure LGTM @ebadyano and fullfils the requirements thanks. I will now review the code.

danielmitterdorfer

Thanks for iterating. I left a couple more comments. Can you please ensure we have user docs?

danielmitterdorfer · 2021-05-14T10:13:18Z

esrally/telemetry.py

+
+            for index_name, stats in shard_stats.items():
+                for curr_shard in stats:
+                    for shard_id, curr_stats in curr_shard.items():


Can we make this more robust in case of missing properties? A benchmark should not fail because a property wasn't contained in the response.

danielmitterdorfer · 2021-05-14T10:15:16Z

esrally/telemetry.py

+            for cluster_name in self.indices_per_cluster.keys():
+                if cluster_name not in clients:
+                    raise exceptions.SystemSetupError(
+                        f"The telemetry parameter 'shard-stats-transforms' must be a JSON Object with keys "


The parameter is called shard-stats-indices, can you please align the names? Also, we should add user docs.

gingerwizard

thanks @ebadyano this largely looks good. A few comments re exceptions and I'm not clear how we support no shard-stats-indices. I think we need a better default value. Also we need docs!

esrally/telemetry.py

gingerwizard · 2021-05-14T09:58:04Z

esrally/telemetry.py

+            May optionally specify:
+            ``shard-stats-indices``: JSON structure specifying the index pattern per cluster to publish stats from.
+            Not all clusters need to be specified, but any name used must be be present in target.hosts. Alternatively,
+            the index pattern can be specified as a string can be specified in case only one cluster is involved.


This last sentence doesn't quite make sense, maybe
"Alternatively, the index pattern can be specified as a string in the event only one cluster is involved."

gingerwizard · 2021-05-14T10:04:49Z

esrally/telemetry.py

+                f"The telemetry parameter 'shard-stats-sample-interval' must be greater than zero but was {self.sample_interval}.")
+
+        self.specified_cluster_names = self.clients.keys()
+        indices_per_cluster = self.telemetry_params.get("shard-stats-indices", False)


is False a valid value here? Shouldn't this be a * by default i.e. we monitor all indices?
Its not clear what happens if nothing is specified for shard-stats-indices given specified_cluster_names will not be set.

maybe {opts.TargetHosts.DEFAULT: "*"} as the default.

So, this was leftover from when i used _cat api and there was a way to pass index parameter to the query to only get stats for specific indices. Looks like nodes api in elasticsearch py client doesn't allow that. I can still keep and process on rally side if we think it could be useful. thoughts?

We don't expect that there are many indices in the cluster that are not created because of the benchmark so I'd opt for simplicity and remove the parameter.

agreed any non-benchmark indices will be negligible. I'd like to filter out internal and system indices but I think we can do that in our dashboarding.

esrally/telemetry.py

danielmitterdorfer

Thanks for the docs. I left a couple of small comments but I think we're almost there now.

danielmitterdorfer · 2021-05-18T05:39:38Z

docs/telemetry.rst

+     "name": "shard-stats",
+     "shard-id": "0",
+     "index": "geonames",
+     "primary": True,


We should show example documents in JSON, rather than Python dict format. Can you please lower-case this?

danielmitterdorfer · 2021-05-18T05:41:47Z

docs/telemetry.rst

+     "primary": True,
+     "docs": 1000,
+     "store": 212027,
+     "segments_count": 8,


We mix hyphens (see shard-id) and underscores here but it would be nicer to read if this is consistent. Can you please use hyphens everywhere?

danielmitterdorfer · 2021-05-18T05:45:36Z

docs/telemetry.rst

+shard-stats
+--------------
+
+The shard-stats telemetry device regularly calls the `cluster node-stats API with level=shard parameter <https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html>`_ and records one metrics document per shard.


Nit: "node-stats" -> "nodes-stats"

danielmitterdorfer · 2021-05-18T05:45:51Z

docs/telemetry.rst

+
+The shard-stats telemetry device regularly calls the `cluster node-stats API with level=shard parameter <https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html>`_ and records one metrics document per shard.
+
+Example of the recorded document::


Nit: "the" -> "a"

danielmitterdorfer · 2021-05-18T05:48:56Z

esrally/telemetry.py

+                            "node": node_name
+                        }
+                        self.metrics_store.put_doc(doc, level=MetaInfoScope.cluster, meta_data=shard_metadata)
+                        print(f"shards {doc}")


oops, thank you for catching

Can you please remove the print statement?

danielmitterdorfer

Once we get rid of a leftover I think we're good. :)

danielmitterdorfer · 2021-05-18T15:44:51Z

esrally/telemetry.py

+                            "node": node_name
+                        }
+                        self.metrics_store.put_doc(doc, level=MetaInfoScope.cluster, meta_data=shard_metadata)
+                        print(f"shards {doc}")


Can you please remove the print statement?

ebadyano · 2021-05-18T17:00:07Z

that's one stubborn print!

danielmitterdorfer

With the print now gone, LGTM :)

Relates to elastic#1258

Relates to #1258

Record shard allocation

5eb71c3

Relates to elastic#1242

ebadyano added enhancement Improves the status quo :Telemetry Telemetry Devices that gather additional metrics labels Apr 29, 2021

ebadyano requested a review from gingerwizard April 29, 2021 14:38

ebadyano self-assigned this Apr 29, 2021

fix lint

0edbc47

danielmitterdorfer added this to the 2.2.1 milestone May 4, 2021

danielmitterdorfer reviewed May 4, 2021

View reviewed changes

ebadyano added 4 commits May 11, 2021 22:07

Adding segment count

f5120e0

Switch to _nodes/stats api instead of _cat

216986e

Address Daniel's comments

ba0251b

Merge branch 'master' of github.com:elastic/rally into shards-info

12cea18

Add segment count

d9b89c0

danielmitterdorfer reviewed May 14, 2021

View reviewed changes

gingerwizard suggested changes May 14, 2021

View reviewed changes

ebadyano added 2 commits May 17, 2021 19:23

Address comments

5047f74

add docs

3ce36fe

ebadyano requested review from gingerwizard and danielmitterdorfer May 17, 2021 23:39

danielmitterdorfer reviewed May 18, 2021

View reviewed changes

address Daniel's comments

5e3ae0b

ebadyano requested a review from danielmitterdorfer May 18, 2021 13:56

danielmitterdorfer reviewed May 18, 2021

View reviewed changes

remove print

a425a35

ebadyano requested a review from danielmitterdorfer May 18, 2021 16:59

danielmitterdorfer approved these changes May 18, 2021

View reviewed changes

ebadyano merged commit 1de6ff8 into elastic:master May 18, 2021

ebadyano added a commit to ebadyano/rally that referenced this pull request May 19, 2021

Fix error for shard-stats

4144627

Relates to elastic#1258

ebadyano mentioned this pull request May 19, 2021

Fix error for shard-stats #1268

Merged

ebadyano added a commit that referenced this pull request May 19, 2021

Fix error for shard-stats (#1268)

d9dcb2c

Relates to #1258

ebadyano deleted the shards-info branch December 16, 2022 15:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record shard allocation #1258

Record shard allocation #1258

ebadyano commented Apr 29, 2021

ebadyano commented Apr 29, 2021 •

edited

Loading

gingerwizard commented Apr 29, 2021

ebadyano commented May 3, 2021

gingerwizard commented May 3, 2021

danielmitterdorfer left a comment

danielmitterdorfer May 4, 2021

pquentin May 5, 2021

danielmitterdorfer May 5, 2021

ebadyano commented May 13, 2021 •

edited

Loading

gingerwizard commented May 14, 2021

danielmitterdorfer left a comment

danielmitterdorfer May 14, 2021

danielmitterdorfer May 14, 2021

gingerwizard left a comment •

edited

Loading

gingerwizard May 14, 2021

gingerwizard May 14, 2021

gingerwizard May 14, 2021

ebadyano May 14, 2021

danielmitterdorfer May 17, 2021

gingerwizard May 17, 2021

danielmitterdorfer left a comment

danielmitterdorfer May 18, 2021

danielmitterdorfer May 18, 2021

danielmitterdorfer May 18, 2021

danielmitterdorfer May 18, 2021

danielmitterdorfer May 18, 2021

ebadyano May 18, 2021

danielmitterdorfer May 18, 2021

danielmitterdorfer left a comment

danielmitterdorfer May 18, 2021

ebadyano commented May 18, 2021

danielmitterdorfer left a comment


		The shard-stats telemetry device regularly calls the `cluster node-stats API with level=shard parameter <https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-stats.html>`_ and records one metrics document per shard.

		Example of the recorded document::

Record shard allocation #1258

Record shard allocation #1258

Conversation

ebadyano commented Apr 29, 2021

ebadyano commented Apr 29, 2021 • edited Loading

gingerwizard commented Apr 29, 2021

ebadyano commented May 3, 2021

gingerwizard commented May 3, 2021

danielmitterdorfer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebadyano commented May 13, 2021 • edited Loading

gingerwizard commented May 14, 2021

danielmitterdorfer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gingerwizard left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielmitterdorfer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danielmitterdorfer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ebadyano commented May 18, 2021

danielmitterdorfer left a comment

Choose a reason for hiding this comment

ebadyano commented Apr 29, 2021 •

edited

Loading

ebadyano commented May 13, 2021 •

edited

Loading

gingerwizard left a comment •

edited

Loading