Ensure that serialized data is measured correctly #7593

fjetter · 2023-02-28T09:40:25Z

TODO: Tests are still missing / not done, yet

fjetter · 2023-02-28T10:16:07Z

distributed/protocol/tests/test_protocol.py

+def test_sizeof_serialize(Wrap, Wrapped):
+    size = 100_000
+    ser_obj = Wrap(b"0" * size)
+    assert size <= sizeof(ser_obj) < size * 1.05


allowing 5% overhead is a bit generous but this test doesn't really care if the wrapper would add more metadata to it or if python object sizes changed, etc.

github-actions · 2023-02-28T11:18:27Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

      26 files ±  0       26 suites ±0 12h 50m 14s ⏱️ + 33m 33s
  3 502 tests +  5   3 394 ✔️ +  1   103 💤 ±0 5 ❌ +4
44 267 runs +66 42 190 ✔️ +58 2 072 💤 +4 5 ❌ +4

For more details on these failures, see this check.

Results for commit 52eee9e. ± Comparison against base commit 700f14a.

♻️ This comment has been updated with latest results.

distributed/protocol/serialize.py

fjetter · 2023-02-28T12:54:15Z

Apparently, there is something wrong with the offload TPE. I get some spurious errors

Traceback (most recent call last):
  File "/home/runner/work/distributed/distributed/distributed/core.py", line 820, in _handle_comm
    result = await result
  File "/home/runner/work/distributed/distributed/distributed/worker.py", line 1795, in get_data
    compressed = await comm.write(msg, serializers=serializers)
  File "/home/runner/work/distributed/distributed/distributed/comm/tcp.py", line 271, in write
    frames = await to_frames(
  File "/home/runner/work/distributed/distributed/distributed/comm/utils.py", line 70, in to_frames
    return await offload(_to_frames)
  File "/home/runner/work/distributed/distributed/distributed/utils.py", line 1417, in offload
    return await loop.run_in_executor(
  File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.8/asyncio/base_events.py", line 783, in run_in_executor
    executor.submit(func, *args), loop=self)
  File "/usr/share/miniconda3/envs/dask-distributed/lib/python3.8/concurrent/futures/thread.py", line 179, in submit
    raise RuntimeError('cannot schedule new futures after shutdown')
RuntimeError: cannot schedule new futures after shutdown

fjetter · 2023-02-28T13:44:07Z

distributed/utils.py

@@ -1390,7 +1390,6 @@ def is_valid_xml(text):


 _offload_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="Dask-Offload")
-weakref.finalize(_offload_executor, _offload_executor.shutdown)


This is just an attempt to fix the error I'm seeing

Two reasons why I believe this should be removed regardless of whether this is a fix or not

All python versions 3.8+ are ensuring that worker threads are terminating on interpreter shutdown already. They explicitly handle the case of collected executors, interpreter shutdown and instance shutdown identically.

Judging by the finalize docs I'm not even sure if this callback is ever triggered

A finalizer will never invoke its callback during the later part of the interpreter shutdown when module globals are liable to have been replaced by None.

since _offload_executor is a module global and unless it has been replaced by None, it has no chance of being GCed/finalized

this is not responsible after all but I still suggest to remove this line

Judging by the finalize docs I'm not even sure if this callback is ever triggered

It's interesting, later in the docs it states:

Note It is important to ensure that func, args and kwargs do not own any references to obj, either directly or indirectly, since otherwise obj will never be garbage collected. In particular, func should not be a bound method of obj.

Notably the last part, which I suppose is part of the unneeded bit of this line.

yes, that note is interesting as well, i.e. this finalize is useless for many reasons

I opened #7639

Our code base is riddled with this pattern

This one line makes me very nervous. Everything you wrote makes perfect sense, but just in case you're wrong, could you move it to its own PR so that it's not ending up in the release? It has the potential of leaving workers stuck on shutdown, and it also has the potential of different behaviour on different Python versions and on different OSs, so I believe some thorough testing is in order.

fjetter · 2023-02-28T14:03:17Z

There are non-trivial failures, e.g. test_acquire_replicas_large_data raises state machine AssertionErrors ...

fjetter · 2023-03-09T13:08:44Z

distributed/tests/test_worker.py

-    finally:
-        threadpool.shutdown()


somehow, this actually shutdown the actual offload threadpool, not just the mock, i.e. nothing in our test suite was using the offloader threadpool after this test ran 🤯

I do not entirely understand why this is shutting down the actual threadpool but I don't care. I removed the mock and it works now

fjetter · 2023-03-09T14:25:44Z

distributed/tests/test_worker.py

+    async def custom_worker_offload(func, *args):
+        res = func(*args)
+        if not istask(args) and istask(res):
+            in_deserialize.set()
+            await wait_in_deserialize.wait()
+        return res

-        while CountingThreadPool.counter == 0:
-            await asyncio.sleep(0)
+    monkeypatch.setattr("distributed.worker.offload", custom_worker_offload)


The test logic is now slightly different but I believe more robust. We don't truly care about the offloading part but rather that there is an await during deserialization. Therefore, I'll only patch the offload method. To ensure that we're truly in the task spec deserialization I put in the istask guard above

fjetter · 2023-03-09T16:52:32Z

I think test failures are unrelated. @crusaderky care for another look?

fjetter · 2023-03-10T18:01:18Z

FWIW benchmark results are available here https://github.com/coiled/coiled-runtime/actions/runs/4384325501 but they don't look particularly interesting. I don't think our tests are very sensitive to this, yet. See also coiled/benchmarks#696

crusaderky

.

distributed/tests/test_worker.py

crusaderky · 2023-03-10T18:42:37Z

distributed/utils.py

@@ -1390,7 +1390,6 @@ def is_valid_xml(text):


 _offload_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="Dask-Offload")
-weakref.finalize(_offload_executor, _offload_executor.shutdown)


This one line makes me very nervous. Everything you wrote makes perfect sense, but just in case you're wrong, could you move it to its own PR so that it's not ending up in the release? It has the potential of leaving workers stuck on shutdown, and it also has the potential of different behaviour on different Python versions and on different OSs, so I believe some thorough testing is in order.

fjetter · 2023-03-13T10:38:44Z

@crusaderky next time, feel free to just push these minor changes

crusaderky · 2023-03-13T18:36:29Z

I've started an A/B test with uncompressible as well as compressible data, based on coiled/benchmarks#696:
https://github.com/coiled/coiled-runtime/actions/runs/4408363294

jrbourbeau

Thanks for the fix @fjetter and review @crusaderky

Test failures seem unrelated, but the distributed/shuffle/tests/test_shuffle.py::test_new_worker failure in this build looks interesting. cc @hendrikmakait for visibility

crusaderky · 2023-03-14T12:15:02Z

I ran the A/B tests and I'm observing a very modest (5%), but consistent speedup in test_filter_then_average. The test uses data that is compressible at 37%. The other tests do not show any kind of change - including those running on data that is compressible at 99%. The reason is that the data compressible at 37% takes 140ms per chunk to compress, whereas identically sized data full of ones takes 14ms per chunk. I'll need to amend the PR that introduces compressible data and rerun the A/B test.

fjetter · 2023-03-14T14:57:09Z

I ran the A/B tests and I'm observing a very modest (5%), but consistent speedup

Well, it's an easy win regardless. Even if it's not boosting performance significantly, having a more healthy event loop is a win already for stability.

crusaderky · 2023-03-15T13:02:27Z

I have produced a meaningful A/B test, running on data that is 42% compressible and takes a substantial amount of time to compress.

test_anom_mean is 15~20% slower.
test_double_diff is 5% faster
test_filter_then_average is 5% faster
no impact on test_dot_product
no impact on test_vorticity
inconclusive data on test_map_overlap_sample (test is too noisy, with either compressible or incompressible data)

It's important to note that, before this PR, test_anom_mean and test_double_diff were respectively 50% and 30% slower on compressible data than they were on uncompressible data, and test_vorticity is 67% slower on compressible data. I'll open a separate issue to discuss these findings.

fjetter requested a review from crusaderky February 28, 2023 09:40

fjetter commented Feb 28, 2023

View reviewed changes

crusaderky reviewed Feb 28, 2023

View reviewed changes

distributed/protocol/serialize.py Outdated Show resolved Hide resolved

fjetter commented Feb 28, 2023

View reviewed changes

fjetter added 3 commits March 9, 2023 12:34

Ensure that serialized data is measured correctly

d436ee0

remove weakref finalizer for offload pool

9584b28

use randbytes compat

4fe360d

fjetter force-pushed the ensure_serialized_data_meassured_correctly branch from 043b138 to 4fe360d Compare March 9, 2023 11:38

Test sizeof of Serialize and ToPickle

690ec48

fjetter force-pushed the ensure_serialized_data_meassured_correctly branch from 7eb755e to 690ec48 Compare March 9, 2023 11:47

fjetter added 2 commits March 9, 2023 13:39

Use safe_sizeof in to_frames

d3bc7ec

Remove mock from test_steal_during_task_deserialization

3d0fc9c

fjetter commented Mar 9, 2023

View reviewed changes

avoid distributed.Event during deserialization

dafc08a

fjetter commented Mar 9, 2023

View reviewed changes

fjetter requested a review from milesgranger March 9, 2023 14:35

This was referenced Mar 10, 2023

Weakref finalizers prohibit GC #7639

Open

Release 2023.3.1 dask/community#312

Closed

crusaderky reviewed Mar 10, 2023

View reviewed changes

fjetter mentioned this pull request Mar 13, 2023

Remove weakref finalizer for Offload Executor #7644

Merged

fjetter and others added 3 commits March 13, 2023 15:50

revert tpe finalizer

1fbadf8

simplify test_steal_during_task_deserialization

6c2f4ad

Merge branch 'main' into ensure_serialized_data_meassured_correctly

52eee9e

crusaderky approved these changes Mar 13, 2023

View reviewed changes

jrbourbeau reviewed Mar 13, 2023

View reviewed changes

jrbourbeau merged commit 6cab0e2 into dask:main Mar 13, 2023

crusaderky mentioned this pull request Mar 15, 2023

Compression slows down network comms #7655

Closed

ypogorelova pushed a commit to ypogorelova/distributed that referenced this pull request Mar 16, 2023

Ensure that serialized data is measured correctly (dask#7593)

01955ae

pentschev mentioned this pull request Mar 29, 2023

Can't start LocalCUDACluster w/ ucx rapidsai/dask-cuda#1148

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure that serialized data is measured correctly #7593

Ensure that serialized data is measured correctly #7593

fjetter commented Feb 28, 2023

fjetter Feb 28, 2023

github-actions bot commented Feb 28, 2023 •

edited

Loading

fjetter commented Feb 28, 2023

fjetter Feb 28, 2023 •

edited

Loading

fjetter Mar 9, 2023

milesgranger Mar 10, 2023

fjetter Mar 10, 2023

fjetter Mar 10, 2023

crusaderky Mar 10, 2023 •

edited

Loading

fjetter commented Feb 28, 2023

fjetter Mar 9, 2023

fjetter Mar 9, 2023

fjetter Mar 9, 2023

fjetter commented Mar 9, 2023

fjetter commented Mar 10, 2023

crusaderky left a comment

crusaderky Mar 10, 2023 •

edited

Loading

fjetter commented Mar 13, 2023

crusaderky commented Mar 13, 2023

jrbourbeau left a comment

crusaderky commented Mar 14, 2023

fjetter commented Mar 14, 2023

crusaderky commented Mar 15, 2023 •

edited

Loading

		@@ -1390,7 +1390,6 @@ def is_valid_xml(text):


		_offload_executor = ThreadPoolExecutor(max_workers=1, thread_name_prefix="Dask-Offload")
		weakref.finalize(_offload_executor, _offload_executor.shutdown)

Ensure that serialized data is measured correctly #7593

Ensure that serialized data is measured correctly #7593

Conversation

fjetter commented Feb 28, 2023

Choose a reason for hiding this comment

github-actions bot commented Feb 28, 2023 • edited Loading

Unit Test Results

fjetter commented Feb 28, 2023

fjetter Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

crusaderky Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

fjetter commented Feb 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjetter commented Mar 9, 2023

fjetter commented Mar 10, 2023

crusaderky left a comment

Choose a reason for hiding this comment

crusaderky Mar 10, 2023 • edited Loading

Choose a reason for hiding this comment

fjetter commented Mar 13, 2023

crusaderky commented Mar 13, 2023

jrbourbeau left a comment

Choose a reason for hiding this comment

crusaderky commented Mar 14, 2023

fjetter commented Mar 14, 2023

crusaderky commented Mar 15, 2023 • edited Loading

github-actions bot commented Feb 28, 2023 •

edited

Loading

fjetter Feb 28, 2023 •

edited

Loading

crusaderky Mar 10, 2023 •

edited

Loading

crusaderky Mar 10, 2023 •

edited

Loading

crusaderky commented Mar 15, 2023 •

edited

Loading