Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow conflict-probability value of 0 #510

Merged
merged 3 commits into from
May 24, 2018

Conversation

dliappis
Copy link
Contributor

For the use case of externally generated ids (no updates), allow
setting conflict-probability to the value 0.

@dliappis dliappis added bug Something's wrong enhancement Improves the status quo :Docs Changes to the documentation labels May 24, 2018
@dliappis dliappis added this to the 0.12.0 milestone May 24, 2018
For the use case of externally generated ids (no updates), allow
setting conflict-probability to the value 0.
Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising already. I left a few minor comments about the docs and a suggestion to improve one of the new test cases.

docs/track.rst Outdated
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).
* ``conflict-probability`` (optional, defaults to 25 percent): A number between (0, 100] that defines how many of the documents will get replaced.
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated (also read below on how to use non-autogenerated index ids with no conflicts). Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).
* ``conflict-probability`` (optional, defaults to 25 percent): A number between [0, 100] that defines how many of the documents will get replaced. Combining ``conflicts=sequential`` and ``conflict-probability=0`` makes Rally generate index ids by itself, instead of relying on Elasticsearch's `Automatic ID Generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Automatic ID Generation" -> "automatic id generation"?

docs/track.rst Outdated
@@ -314,8 +314,8 @@ With the operation type ``bulk`` you can execute `bulk requests <http://www.elas
* ``indices`` (optional): A list of index names that defines which indices should be used by this bulk-index operation. Rally will then only select the documents files that have a matching ``target-index`` specified.
* ``batch-size`` (optional): Defines how many documents Rally will read at once. This is an expert setting and only meant to avoid accidental bottlenecks for very small bulk sizes (e.g. if you want to benchmark with a bulk-size of 1, you should set ``batch-size`` higher).
* ``pipeline`` (optional): Defines the name of an (existing) ingest pipeline that should be used (only supported from Elasticsearch 5.0).
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated. Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).
* ``conflict-probability`` (optional, defaults to 25 percent): A number between (0, 100] that defines how many of the documents will get replaced.
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated (also read below on how to use non-autogenerated index ids with no conflicts). Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we refer to "non-autogenerated ids" as "external ids" instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I googled everywhere but I haven't seen the term "external ids", however, so that's why I opted for this :)

self.assertEqual(idx("100"), next(generator))
self.assertEqual(idx("200"), next(generator))
self.assertEqual(idx("300"), next(generator))
self.assertEqual(idx("400"), next(generator))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That leaves the possibility that the generator is creating more entries than you expect. You could convert it to a list and compare the list contents to be sure it is only generating four elements?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Addressed in 400b170

* Improve test case to match the complete list of generated documented
  with 0 conflict probability

* Doc fixes
@dliappis
Copy link
Contributor Author

@danielmitterdorfer I addressed the comments in 400b170

Copy link
Member

@danielmitterdorfer danielmitterdorfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one typo, apart from that LGTM. No need for another review cycle.

docs/track.rst Outdated
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated (also read below on how to use non-autogenerated index ids with no conflicts). Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).
* ``conflict-probability`` (optional, defaults to 25 percent): A number between [0, 100] that defines how many of the documents will get replaced. Combining ``conflicts=sequential`` and ``conflict-probability=0`` makes Rally generate index ids by itself, instead of relying on Elasticsearch's `Automatic ID Generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>`_.
* ``conflicts`` (optional): Type of index conflicts to simulate. If not specified, no conflicts will be simulated (also read below on how to use external index ids with no conflicts). Valid values are: 'sequential' (A document id is replaced with a document id with a sequentially increasing id), 'random' (A document id is replaced with a document id with a random other id).
* ``conflict-probability`` (optional, defaults to 25 percent): A number between [0, 100] that defines how many of the documents will get replaced. Combining ``conflicts=sequential`` and ``conflict-probability=0`` makes Rally generate index ids by itself, instead of relying on Elasticsearch's `automatic iD generation <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-index_.html#_automatic_id_generation>`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"iD" -> "ID"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f00b32e

@dliappis dliappis merged commit 03971a9 into elastic:master May 24, 2018
@dliappis dliappis deleted the allow-zero-conflict-probability branch May 24, 2018 15:35
@dliappis
Copy link
Contributor Author

Thanks @danielmitterdorfer !

@danielmitterdorfer
Copy link
Member

@dliappis can we settle for either "bug" or "enhancement" for this PR?

@dliappis
Copy link
Contributor Author

@danielmitterdorfer I thought it contains a bit of both :) However, I realized there's something missing here, which are the checks in float_param so I guess I'll just label it as bug (fix) for now and make the follow up PR an enhancement.

@dliappis dliappis added bug Something's wrong :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. and removed bug Something's wrong enhancement Improves the status quo :Docs Changes to the documentation labels May 25, 2018
dliappis added a commit to dliappis/rally that referenced this pull request May 25, 2018
PR elastic#510 doesn't allow setting conflict probability to 0 as the
parameter check is failing, assuming only open intervals are allowed.

Fix bug and also add a number of unit tests.
dliappis added a commit to dliappis/rally that referenced this pull request May 25, 2018
For the use case of externally generated ids (no updates), allow
setting conflict-probability to the value 0.

Relates elastic#510
dliappis added a commit to dliappis/rally that referenced this pull request May 25, 2018
PR elastic#510 doesn't allow setting conflict probability to 0 as the
parameter check is failing, assuming only open intervals are allowed.

Fix bug and also add a number of unit tests.
dliappis added a commit to dliappis/rally that referenced this pull request May 25, 2018
PR elastic#510 doesn't allow setting conflict probability to 0 as the
parameter check is failing, assuming only open intervals are allowed.

Fix bug and also add a number of unit tests.
dliappis added a commit that referenced this pull request May 25, 2018
PR #510 doesn't allow setting conflict probability to 0 as the
parameter check is failing, assuming only open intervals are allowed.

Fix bug and also add a number of unit tests.

Relates #511
@danielmitterdorfer danielmitterdorfer removed this from the 0.12.0 milestone May 25, 2018
@danielmitterdorfer danielmitterdorfer added this to the 1.0.0 milestone May 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something's wrong :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants