Add refresh/merge/flush totals in summary

Due to #608 it's likely we need to benchmark scenarios without using the node-stats telemetry device. At the same time we want to get a general idea of how many refreshes/merges/flushes happened (in total) by accessing the index stats. Add total count for merges/refresh/flush in summary output; this is collected from `_all/primaries` in `_stats`. Also that these values are cumulatives from primary shards and also clarify desciprion for min/median/max in the summary report. Finally fix bug where index stats where time/count == 0 got skipped from the summary. Closes #614 Relates: #615 Relates: elastic/elasticsearch#35594 (comment)
elastic · Dec 14, 2018 · be6336c · be6336c
1 parent 738959a
commit be6336c
Show file tree

Hide file tree

Showing 7 changed files with 359 additions and 143 deletions.
diff --git a/docs/metrics.rst b/docs/metrics.rst
@@ -143,13 +143,16 @@ Rally stores the following metrics:
 * ``segments_terms_memory_in_bytes``: Number of bytes used for terms as reported by the indices stats API.
 * ``segments_norms_memory_in_bytes``: Number of bytes used for norms as reported by the indices stats API.
 * ``segments_points_memory_in_bytes``: Number of bytes used for points as reported by the indices stats API.
-* ``merges_total_time``: Total runtime of merges as reported by the indices stats API. Note that this is not Wall clock time (i.e. if M merge threads ran for N minutes, we will report M * N minutes, not N minutes). These metrics records also have a ``per-shard`` property that contains the times per primary shard in an array.
-* ``merges_total_throttled_time``: Total time within merges have been throttled as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times per primary shard in an array.
-* ``indexing_total_time``: Total time used for indexing as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times per primary shard in an array.
-* ``indexing_throttle_time``: Total time that indexing has been throttled as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times per primary shard in an array.
-* ``refresh_total_time``: Total time used for index refresh as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times per primary shard in an array.
-* ``flush_total_time``: Total time used for index flush as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times per primary shard in an array.
+* ``merges_total_time``: Cumulative runtime of merges of primary shards, as reported by the indices stats API. Note that this is not Wall clock time (i.e. if M merge threads ran for N minutes, we will report M * N minutes, not N minutes). These metrics records also have a ``per-shard`` property that contains the times across primary shards in an array.
+* ``merges_total_count``: Cumulative number of merges of primary shards, as reported by indices stats API under ``_all/primaries``.
+* ``merges_total_throttled_time``: Cumulative time within merges have been throttled as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times across primary shards in an array.
+* ``indexing_total_time``: Cumulative time used for indexing of primary shards, as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times across primary shards in an array.
+* ``indexing_throttle_time``: Cumulative time that indexing has been throttled, as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times across primary shards in an array.
+* ``refresh_total_time``: Cumulative time used for index refresh of primary shards, as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times across primary shards in an array.
+* ``refresh_total_count``: Cumulative number of refreshes of primary shards, as reported by indices stats API under ``_all/primaries``.
+* ``flush_total_time``: Cumulative time used for index flush of primary shards, as reported by the indices stats API. Note that this is not Wall clock time.  These metrics records also have a ``per-shard`` property that contains the times across primary shards in an array.
+* ``flush_total_count``: Cumulative number of flushes of primary shards, as reported by indices stats API under ``_all/primaries``.
 * ``final_index_size_bytes``: Final resulting index size on the file system after all nodes have been shutdown at the end of the benchmark. It includes all files in the nodes' data directories (actual index files and translog).
-* ``store_size_in_bytes``: The size in bytes of the index (excluding the translog) as reported by the indices stats API.
-* ``translog_size_in_bytes``: The size in bytes of the translog as reported by the indices stats API.
+* ``store_size_in_bytes``: The size in bytes of the index (excluding the translog), as reported by the indices stats API.
+* ``translog_size_in_bytes``: The size in bytes of the translog, as reported by the indices stats API.
 * ``ml_processing_time``: A structure containing the minimum, mean, median and maximum bucket processing time in milliseconds per machine learning job. These metrics are only available if a machine learning job has been created in the respective benchmark.
diff --git a/docs/quickstart.rst b/docs/quickstart.rst
@@ -25,9 +25,10 @@ Run your first race
 
 Now we're ready to run our first :doc:`race </glossary>`::
 
-    esrally --distribution-version=6.0.0
+    esrally --distribution-version=6.5.3
+
+This will download Elasticsearch 6.5.3 and run Rally's default :doc:`track </glossary>` - the `geonames track <https://github.com/elastic/rally-tracks/tree/master/geonames>`_ - against it. After the race, a :doc:`summary report </summary_report>` is written to the command line:::
 
-This will download Elasticsearch 6.0.0 and run Rally's default :doc:`track </glossary>` - the `geonames track <https://github.com/elastic/rally-tracks/tree/master/geonames>`_ - against it. After the race, a :doc:`summary report </summary_report>` is written to the command line:::
 
     ------------------------------------------------------
         _______             __   _____
@@ -37,54 +38,81 @@ This will download Elasticsearch 6.0.0 and run Rally's default :doc:`track </glo
     /_/   /_/_/ /_/\__,_/_/   /____/\___/\____/_/   \___/
     ------------------------------------------------------
 
-    |                         Metric |                 Task |     Value |   Unit |
-    |-------------------------------:|---------------------:|----------:|-------:|
-    |            Total indexing time |                      |   28.0997 |    min |
-    |               Total merge time |                      |   6.84378 |    min |
-    |             Total refresh time |                      |   3.06045 |    min |
-    |               Total flush time |                      |  0.106517 |    min |
-    |      Total merge throttle time |                      |   1.28193 |    min |
-    |               Median CPU usage |                      |     471.6 |      % |
-    |             Total Young Gen GC |                      |    16.237 |      s |
-    |               Total Old Gen GC |                      |     1.796 |      s |
-    |                     Index size |                      |   2.60124 |     GB |
-    |                Totally written |                      |   11.8144 |     GB |
-    |         Heap used for segments |                      |   14.7326 |     MB |
-    |       Heap used for doc values |                      |  0.115917 |     MB |
-    |            Heap used for terms |                      |   13.3203 |     MB |
-    |            Heap used for norms |                      | 0.0734253 |     MB |
-    |           Heap used for points |                      |    0.5793 |     MB |
-    |    Heap used for stored fields |                      |  0.643608 |     MB |
-    |                  Segment count |                      |        97 |        |
-    |                 Min Throughput |         index-append |   31925.2 | docs/s |
-    |              Median Throughput |         index-append |   39137.5 | docs/s |
-    |                 Max Throughput |         index-append |   39633.6 | docs/s |
-    |      50.0th percentile latency |         index-append |   872.513 |     ms |
-    |      90.0th percentile latency |         index-append |   1457.13 |     ms |
-    |      99.0th percentile latency |         index-append |   1874.89 |     ms |
-    |       100th percentile latency |         index-append |   2711.71 |     ms |
-    | 50.0th percentile service time |         index-append |   872.513 |     ms |
-    | 90.0th percentile service time |         index-append |   1457.13 |     ms |
-    | 99.0th percentile service time |         index-append |   1874.89 |     ms |
-    |  100th percentile service time |         index-append |   2711.71 |     ms |
-    |                           ...  |                  ... |       ... |    ... |
-    |                           ...  |                  ... |       ... |    ... |
-    |                 Min Throughput |     painless_dynamic |   2.53292 |  ops/s |
-    |              Median Throughput |     painless_dynamic |   2.53813 |  ops/s |
-    |                 Max Throughput |     painless_dynamic |   2.54401 |  ops/s |
-    |      50.0th percentile latency |     painless_dynamic |    172208 |     ms |
-    |      90.0th percentile latency |     painless_dynamic |    310401 |     ms |
-    |      99.0th percentile latency |     painless_dynamic |    341341 |     ms |
-    |      99.9th percentile latency |     painless_dynamic |    344404 |     ms |
-    |       100th percentile latency |     painless_dynamic |    344754 |     ms |
-    | 50.0th percentile service time |     painless_dynamic |    393.02 |     ms |
-    | 90.0th percentile service time |     painless_dynamic |   407.579 |     ms |
-    | 99.0th percentile service time |     painless_dynamic |   430.806 |     ms |
-    | 99.9th percentile service time |     painless_dynamic |   457.352 |     ms |
-    |  100th percentile service time |     painless_dynamic |   459.474 |     ms |
+    |   Lap |                                                          Metric |                   Task |     Value |    Unit |
+    |------:|----------------------------------------------------------------:|-----------------------:|----------:|--------:|
+    |   All |                      Cumulative indexing time of primary shards |                        |   54.5878 |     min |
+    |   All |              Min cumulative indexing time across primary shards |                        |   10.7519 |     min |
+    |   All |           Median cumulative indexing time across primary shards |                        |   10.9219 |     min |
+    |   All |              Max cumulative indexing time across primary shards |                        |   11.1754 |     min |
+    |   All |             Cumulative indexing throttle time of primary shards |                        |         0 |     min |
+    |   All |     Min cumulative indexing throttle time across primary shards |                        |         0 |     min |
+    |   All |  Median cumulative indexing throttle time across primary shards |                        |         0 |     min |
+    |   All |     Max cumulative indexing throttle time across primary shards |                        |         0 |     min |
+    |   All |                         Cumulative merge time of primary shards |                        |   20.4128 |     min |
+    |   All |                        Cumulative merge count of primary shards |                        |       136 |         |
+    |   All |                 Min cumulative merge time across primary shards |                        |   3.82548 |     min |
+    |   All |              Median cumulative merge time across primary shards |                        |    4.1088 |     min |
+    |   All |                 Max cumulative merge time across primary shards |                        |   4.38148 |     min |
+    |   All |                Cumulative merge throttle time of primary shards |                        |   1.17975 |     min |
+    |   All |        Min cumulative merge throttle time across primary shards |                        |    0.1169 |     min |
+    |   All |     Median cumulative merge throttle time across primary shards |                        |   0.26585 |     min |
+    |   All |        Max cumulative merge throttle time across primary shards |                        |  0.291033 |     min |
+    |   All |                       Cumulative refresh time of primary shards |                        |    7.0317 |     min |
+    |   All |                      Cumulative refresh count of primary shards |                        |       420 |         |
+    |   All |               Min cumulative refresh time across primary shards |                        |   1.37088 |     min |
+    |   All |            Median cumulative refresh time across primary shards |                        |    1.4076 |     min |
+    |   All |               Max cumulative refresh time across primary shards |                        |   1.43343 |     min |
+    |   All |                         Cumulative flush time of primary shards |                        |  0.599417 |     min |
+    |   All |                        Cumulative flush count of primary shards |                        |        10 |         |
+    |   All |                 Min cumulative flush time across primary shards |                        | 0.0946333 |     min |
+    |   All |              Median cumulative flush time across primary shards |                        |  0.118767 |     min |
+    |   All |                 Max cumulative flush time across primary shards |                        |   0.14145 |     min |
+    |   All |                                                Median CPU usage |                        |     284.4 |       % |
+    |   All |                                              Total Young Gen GC |                        |    12.868 |       s |
+    |   All |                                                Total Old Gen GC |                        |     3.803 |       s |
+    |   All |                                                      Store size |                        |   3.17241 |      GB |
+    |   All |                                                   Translog size |                        |   2.62736 |      GB |
+    |   All |                                                      Index size |                        |   5.79977 |      GB |
+    |   All |                                                 Totally written |                        |   22.8536 |      GB |
+    |   All |                                          Heap used for segments |                        |   18.8885 |      MB |
+    |   All |                                        Heap used for doc values |                        | 0.0322647 |      MB |
+    |   All |                                             Heap used for terms |                        |   17.7184 |      MB |
+    |   All |                                             Heap used for norms |                        | 0.0723877 |      MB |
+    |   All |                                            Heap used for points |                        |  0.277171 |      MB |
+    |   All |                                     Heap used for stored fields |                        |  0.788307 |      MB |
+    |   All |                                                   Segment count |                        |        94 |         |
+    |   All |                                                  Min Throughput |           index-append |   38089.5 |  docs/s |
+    |   All |                                               Median Throughput |           index-append |   38613.9 |  docs/s |
+    |   All |                                                  Max Throughput |           index-append |   40693.3 |  docs/s |
+    |   All |                                         50th percentile latency |           index-append |   803.417 |      ms |
+    |   All |                                         90th percentile latency |           index-append |    1913.7 |      ms |
+    |   All |                                         99th percentile latency |           index-append |   3591.23 |      ms |
+    |   All |                                       99.9th percentile latency |           index-append |   6176.23 |      ms |
+    |   All |                                        100th percentile latency |           index-append |   6642.97 |      ms |
+    |   All |                                    50th percentile service time |           index-append |   803.417 |      ms |
+    |   All |                                    90th percentile service time |           index-append |    1913.7 |      ms |
+    |   All |                                    99th percentile service time |           index-append |   3591.23 |      ms |
+    |   All |                                  99.9th percentile service time |           index-append |   6176.23 |      ms |
+    |   All |                                   100th percentile service time |           index-append |   6642.97 |      ms |
+    |   All |                                                      error rate |           index-append |         0 |       % |
+    |   All |                                                            ...  |                    ... |       ... |     ... |
+    |   All |                                                            ...  |                    ... |       ... |     ... |
+    |   All |                                                  Min Throughput | large_prohibited_terms |         2 |   ops/s |
+    |   All |                                               Median Throughput | large_prohibited_terms |         2 |   ops/s |
+    |   All |                                                  Max Throughput | large_prohibited_terms |         2 |   ops/s |
+    |   All |                                         50th percentile latency | large_prohibited_terms |   344.429 |      ms |
+    |   All |                                         90th percentile latency | large_prohibited_terms |   353.187 |      ms |
+    |   All |                                         99th percentile latency | large_prohibited_terms |    377.22 |      ms |
+    |   All |                                        100th percentile latency | large_prohibited_terms |   392.918 |      ms |
+    |   All |                                    50th percentile service time | large_prohibited_terms |   341.177 |      ms |
+    |   All |                                    90th percentile service time | large_prohibited_terms |   349.979 |      ms |
+    |   All |                                    99th percentile service time | large_prohibited_terms |   374.958 |      ms |
+    |   All |                                   100th percentile service time | large_prohibited_terms |    388.62 |      ms |
+    |   All |                                                      error rate | large_prohibited_terms |         0 |       % |
+
 
     ----------------------------------
-    [INFO] SUCCESS (took 2634 seconds)
+    [INFO] SUCCESS (took 1862 seconds)
     ----------------------------------
 
 
@@ -94,4 +122,3 @@ Next steps
 Now you can check :doc:`how to run benchmarks </race>`, get a better understanding how to interpret the numbers in the :doc:`summary report </summary_report>` or start to :doc:`create your own tracks </adding_tracks>`. Be sure to check also some :doc:`tips and tricks </recipes>` to help you understand how to solve specific problems in Rally.
 
 Also run ``esrally --help`` to see what options are available and keep the :doc:`command line reference </command_line_reference>` handy for more detailed explanations of each option.
-