Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pytest-cov leaves .coverage.hostname.number.number data files if running tests against multiprocessing.Pool #250

Closed
manycoding opened this issue Dec 24, 2018 · 22 comments · Fixed by #265

Comments

@manycoding
Copy link

manycoding commented Dec 24, 2018

Running pytest with coverage for tests against multiprocessing.Pool generates several data files, but not all of them are cleaned after. It looks similar to #100.
Pool.join() doesn't seem to change anything.
Example filename .coverage.myhost.local.7582.948066

Here's the repository to reproduce: https://github.com/manycoding/pytest_cov_pool_datafiles_250

Your operating system name and version:
Mac OS Mojave 10.14 (18A391)

Any details about your local setup that might be helpful in troubleshooting:

pipenv shell; pipenv install

Pipfile

pytest = "*"
pytest-cov = "*"
pytest-mock = "*"
tox-pipenv = "*"
pytest-pythonpath = "*"

Detailed steps to reproduce the bug:

pytest --cov=src --cov-report=term-missing  tests/test_cov_pool.py
@manycoding
Copy link
Author

@ionelmc Hi, any thoughts on this?

@blueyed
Copy link
Contributor

blueyed commented Feb 14, 2019

Try running coverage combine manually afterwards.
Maybe it gets created after pytest-cov finishes? (check the timestamps)

@manycoding
Copy link
Author

manycoding commented Feb 14, 2019

It gets created during pytest execution and I see warnings on these files:

Coverage.py warning: Data file .coverage.myhost.local.26934.807659' doesn't seem to be a coverage data file: Couldn't use >data file .coverage.myhost.local.26934.807659': no such table: coverage_schema

If I execute coverage report separately, no files are created.
coverage combine throws same errors as above.

@blueyed
Copy link
Contributor

blueyed commented Feb 14, 2019

no such table

That looks like it is using the new format from coveragepy 5.
And that might be caused by using different configs, or rather something not using the config (e.g. when run via tox / in a temporary dir).

You can try exporting COVERAGE_DEBUG (see the coveragepy docs to get more information).

@manycoding
Copy link
Author

manycoding commented Feb 14, 2019

It happens with both coverage 5 and 4.5.2
Here is the log which I got with export COVERAGE_DEBUG=config,multiproc,sys,plugin
https://gist.github.com/manycoding/6e92cdb3d994085c7660f7f2127a5dbe

@blueyed
Copy link
Contributor

blueyed commented Feb 14, 2019

Looking at "config_reads:" indicates that it always uses your tox.ini, that looks fine.
I would also add "process" to COVERAGE_DEBUG.

What I've meant is that the error looks like coverage 4 might read a file created by 5.

@manycoding
Copy link
Author

@blueyed I use pipenv and clean envs, so there's no way that coverage 4 and 5 files can be mixed.

@manycoding
Copy link
Author

Updated the gist above with the debug output containing process

@blueyed
Copy link
Contributor

blueyed commented Feb 15, 2019

So then you were using coverage 5 with the following error:

Coverage.py warning: Data file .coverage.myhost.local.26934.807659' doesn't seem to be a coverage data file: Couldn't use >data file .coverage.myhost.local.26934.807659': no such table: coverage_schema

Have you inspected the file? Is it maybe just empty? Maybe coveragepy is crashing after all.

Anyway, I cannot really help further I guess.

@manycoding
Copy link
Author

manycoding commented Feb 15, 2019

Yep it's empty and I think it's the same problem as in #100 for no particular reason.

I tested it both with coverage 4 and 5 and in both versions the problem persists.

@blueyed
Copy link
Contributor

blueyed commented Feb 15, 2019

You could try to figure out where it gets created from: either from/based on logs (search for the pid from the filename there), and/or by bisecting your tests (i.e. run the first 50% first, etc), and figure out from there which test creates it.

@ionelmc
Copy link
Member

ionelmc commented Feb 16, 2019

So I've traced a bit the example. It turns out that once in a while there's an unclean shutdown, eg:

551756 19:25:52.717097916 3 py.test (27575) < openat fd=10(<f>/code/.coverage.ec28a80d3a90.3493.150512) dirfd=-100(AT_FDCWD) name=/code/.coverage.ec28a80d3a90.3493.150512 flags=4358(O_TRUNC|O_CREAT|O_WRONLY|O_CL
OEXEC) mode=0666
551757 19:25:52.717099916 3 py.test (27575) > fstat fd=10(<f>/code/.coverage.ec28a80d3a90.3493.150512)
551758 19:25:52.717101016 3 py.test (27575) < fstat res=0
551759 19:25:52.717103516 3 py.test (27575) > ioctl fd=10(<f>/code/.coverage.ec28a80d3a90.3493.150512) request=5401 argument=7FFF9CA7E740
551760 19:25:52.717104616 3 py.test (27575) < ioctl res=-25(ENOTTY)
551761 19:25:52.717108516 3 py.test (27575) > lseek fd=10(<f>/code/.coverage.ec28a80d3a90.3493.150512) offset=0 whence=1(SEEK_CUR)
551762 19:25:52.717109016 3 py.test (27575) < lseek res=0
551763 19:25:52.717111916 3 py.test (27575) > ioctl fd=10(<f>/code/.coverage.ec28a80d3a90.3493.150512) request=5401 argument=7FFF9CA7E6C0
551764 19:25:52.717112316 3 py.test (27575) < ioctl res=-25(ENOTTY)
551765 19:25:52.717133316 3 py.test (27575) > lseek fd=10(<f>/code/.coverage.ec28a80d3a90.3493.150512) offset=0 whence=1(SEEK_CUR)
551766 19:25:52.717134116 3 py.test (27575) < lseek res=0
551767 19:25:52.717173016 0 py.test (27500) > write fd=12(<p>) size=8
551768 19:25:52.717175917 0 py.test (27500) < write res=8 data=
N.
551769 19:25:52.717189917 3 py.test (27575) > lstat
551770 19:25:52.717193517 3 py.test (27575) < lstat res=0 path=/usr
551771 19:25:52.717200517 3 py.test (27575) > lstat
551772 19:25:52.717202417 3 py.test (27575) < lstat res=0 path=/usr/lib
551773 19:25:52.717204817 0 py.test (27500) > wait4
551774 19:25:52.717206917 3 py.test (27575) > lstat
551775 19:25:52.717208617 3 py.test (27575) < lstat res=0 path=/usr/lib/python3.7
551776 19:25:52.717212117 0 py.test (27500) < wait4
551777 19:25:52.717213017 3 py.test (27575) > lstat
551778 19:25:52.717214717 3 py.test (27575) < lstat res=0 path=/usr/lib/python3.7/json
551779 19:25:52.717218917 3 py.test (27575) > lstat
551780 19:25:52.717220317 0 py.test (27500) > wait4
551781 19:25:52.717220917 3 py.test (27575) < lstat res=0 path=/usr/lib/python3.7/json/encoder.py
551782 19:25:52.717221817 0 py.test (27500) < wait4
551783 19:25:52.717229717 0 py.test (27500) > kill pid=3492 sig=15(SIGTERM)
551784 19:25:52.717247717 0 py.test (27500) < kill res=0
551785 19:25:52.717251317 0 py.test (27500) > wait4
551786 19:25:52.717252017 0 py.test (27500) < wait4
551787 19:25:52.717255217 0 py.test (27500) > kill pid=3493 sig=15(SIGTERM)
551788 19:25:52.717256217 1 py.test (27574) > signaldeliver spid=0 dpid=27574(py.test) sig=9(SIGKILL)
551789 19:25:52.717266017 0 py.test (27500) < kill res=0
551790 19:25:52.717268717 3 py.test (27575) > signaldeliver spid=0 dpid=27575(py.test) sig=9(SIGKILL)

@ionelmc
Copy link
Member

ionelmc commented Feb 16, 2019

So this really simple trick (/s) makes the problem go away:

@@ -14,7 +14,8 @@ def get_items_with_pool(count: int, start_index: int = 0):
     processes_count = os.cpu_count()

     batch_size = math.ceil(count / processes_count)
-    with Pool(processes_count) as p:
+    p = Pool(processes_count)
+    try:
         results = p.starmap(
             get_items,
             zip(
@@ -24,10 +25,7 @@ def get_items_with_pool(count: int, start_index: int = 0):
         )
         for items_batch in results:
             items.extend(items_batch)
-    p.join()
+    finally:
+        p.close()
+        p.join()
     return items

For some reason Pool.__exit__ sends KILL in addition to TERM ...

@ionelmc
Copy link
Member

ionelmc commented Feb 16, 2019

So my plan for handling this situation is to offer users two solutions:

@ionelmc
Copy link
Member

ionelmc commented Feb 16, 2019

So the secondary issue if you use pytest_cov.embed.cleanup_on_sigterm is that it overlaps with multiprocessing_start that also registers a cleanup handler via multiprocessing.util.Finalize.

Thus sometimes while the finalizer runs the signal handler will also run and leave one unfinished file around (it will os._exit and the other file is left dangling).

@ssbarnea
Copy link
Member

ssbarnea commented Feb 6, 2020

I am randomly seeing something very similar to this on CI, where we wipe everyting between runs. See https://dashboard.zuul.ansible.com/t/ansible/build/28f84185f22a448f92f00f47253fc931

My impression is that happens only on py27 jobs but still is a PITA and I do not know what to do to avoid it.

@cgnorthcutt
Copy link

I also see this issue on TravisCI, running Windows server, python 3.6 and python 3.7. Scroll down to the bottom of the output here: https://travis-ci.com/cgnorthcutt/cleanlab/jobs/291188046

@ssbarnea
Copy link
Member

This forced me to pin down coverage version below 4 for all versions of python. Mainly this is caused by the fact the coverage expects all files to be generated using the same version of coverage. If you happen to still support py27, you are forced to use older coverage for all environments, not only those using py27.

I guess that the decision to drop py27 support backfires because it forces many users to avoid newer version completely.

@ssbarnea
Copy link
Member

It seems that I keep seeing these random-like errors https://dashboard.zuul.ansible.com/t/ansible/build/fa4b20a11a484ee39050b11f619484b1 -- still more likely on py27.

@domeniconappo
Copy link

Hi,
I also see those errors when I test some functions that use Dask for multiprocessing.
Can it be a regression? It seems the issue was fixed Feb 2019 but it pops out again.

Thanks!

@GergelyKalmar
Copy link

This issue seems to have appeared for me too with coverage 5.0 and pytest 5.4.2.

@ionelmc
Copy link
Member

ionelmc commented May 21, 2020

@GergelyKalmar please open a new issue and add details

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants