Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to use significantly more clients #944

Merged
merged 10 commits into from
Apr 6, 2020

Conversation

danielmitterdorfer
Copy link
Member

@danielmitterdorfer danielmitterdorfer commented Apr 1, 2020

With this commit we change how clients are assigned to worker processes in the
load generator. While historically Rally has assigned one worker (process) to
one client, with the changes done in #935 we can assign multiple clients to a
single worker process now and run the clients in an asyncio event loop. This
allows us to simulate many more clients than what is possible the process-based
approach which is very heavy-weight.

By default Rally will only create as many workers as there are CPUs available on
the system. This choice can be overridden in Rally's config file with the
configuration setting available.cores in the section system to allocate more
or less workers. If multiple load generator machines note that we assume that
all of them have the same hardware specifications and only take the coordinating
machine's CPU count into account.

@danielmitterdorfer danielmitterdorfer added enhancement Improves the status quo :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc. highlight A substantial improvement that is worth mentioning separately in release notes labels Apr 1, 2020
@danielmitterdorfer danielmitterdorfer self-assigned this Apr 1, 2020
@hub-cap
Copy link
Contributor

hub-cap commented Apr 1, 2020

@elasticmachine run tests

1 similar comment
@hub-cap
Copy link
Contributor

hub-cap commented Apr 2, 2020

@elasticmachine run tests

@danielmitterdorfer danielmitterdorfer marked this pull request as ready for review April 2, 2020 12:59
@danielmitterdorfer danielmitterdorfer added this to the 1.5.0 milestone Apr 2, 2020
Copy link
Contributor

@dliappis dliappis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a surprisingly smooth addition leveraging the actor system to allow
us increase the amount of clients!

Unrelated to this specific PR, but since #935 where we introduced the MmapSource it appears that with ~4000 clients (on a single machine) there reported memory usage (from top and pmap) is ~1.1TB, so it's likely we need to look at this in more detail in a follow up.

@@ -431,12 +434,19 @@ def prepare_benchmark(self, t):
self.prepare_telemetry(es_clients)
self.target.on_cluster_details_retrieved(self.retrieve_cluster_info(es_clients))
for host in self.config.opts("driver", "load_driver_hosts"):
host_config = {
# for simplicity we assume that all benchmark machines have the same specs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

self.config, self.client_id, task, schedule, self.sampler, self.cancel, self.complete, self.abort_on_error)
self.logger.info("Worker[%d] is executing tasks at index [%d].", self.worker_id, self.current_task_index)
# allow to buffer more events than by default as we expect to have way more clients.
self.sampler = Sampler(start_timestamp=time.perf_counter(), buffer_size=65536)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to declare static values like 65535 via a module variable?

self.client_id, self.sub_task, self.schedule, es, self.sampler, self.cancel, self.complete, self.abort_on_error)
final_executor = AsyncProfiler(async_executor) if self.profiling_enabled else async_executor

aws = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's not easy to understand what this variable holds judging by its name (plus the name inadvertently conjures thoughts about cloud)

@danielmitterdorfer
Copy link
Member Author

@elasticmachine test this please

@danielmitterdorfer danielmitterdorfer merged commit 8304b64 into elastic:master Apr 6, 2020
danielmitterdorfer added a commit to danielmitterdorfer/rally that referenced this pull request Apr 8, 2020
With this commit we provide a URL object when issuing a request instead
of a string representation to avoid expensive string parsing in aiohttp.
In our tests this has reduced the client side overhead by about one
millisecond which is important when benchmarking queries which finish
within single-digit milliseconds.

Relates elastic#944
danielmitterdorfer added a commit that referenced this pull request Apr 14, 2020
With this commit we provide a URL object when issuing a request instead
of a string representation to avoid expensive string parsing in aiohttp.
In our tests this has reduced the client side overhead by about one
millisecond which is important when benchmarking queries which finish
within single-digit milliseconds.

Relates #944
danielmitterdorfer added a commit to danielmitterdorfer/rally that referenced this pull request Apr 14, 2020
With this commit we properly determine the most recent sample per client
by looping over all clients instead of extracting only one sample. The
reason this was working before is that each processes only simulated one
client but with elastic#944 each worker can simulate multiple clients so we
need to check all samples to determine the most recent one.
danielmitterdorfer added a commit that referenced this pull request Apr 16, 2020
With this commit we properly determine the most recent sample per client
by looping over all clients instead of extracting only one sample. The
reason this was working before is that each processes only simulated one
client but with #944 each worker can simulate multiple clients so we
need to check all samples to determine the most recent one.
@danielmitterdorfer danielmitterdorfer deleted the multicore-async branch April 16, 2020 10:12
@danielmitterdorfer
Copy link
Member Author

Unrelated to this specific PR, but since #935 where we introduced the MmapSource it appears that with ~4000 clients (on a single machine) there reported memory usage (from top and pmap) is ~1.1TB, so it's likely we need to look at this in more detail in a follow up.

#963 to addresses this by sharing the memory-mapped file within each process. This means we memory-map a data file only once per worker (process) instead of once per client.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improves the status quo highlight A substantial improvement that is worth mentioning separately in release notes :Load Driver Changes that affect the core of the load driver such as scheduling, the measurement approach etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants