-
Notifications
You must be signed in to change notification settings - Fork 187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
π Engine: daemon restart fails with circus.exc.ConflictError
#6041
Comments
I've been running into this one quite consistently this morning, so reproduction has become less of an issue. π
Now also running on One of my processes seems to have also excepted due to this issue: File "/Users/mbercx/.virtualenvs/super/lib/python3.9/site-packages/plumpy/events.py", line 97, in run
await self._callback(*self._args, **self._kwargs)
File "/Users/mbercx/.virtualenvs/super/lib/python3.9/site-packages/plumpy/processes.py", line 567, in _run_task
result = await coro(*args, **kwargs)
File "/Users/mbercx/.virtualenvs/super/lib/python3.9/site-packages/plumpy/utils.py", line 245, in wrap
return coro_or_fn(*args, **kwargs)
File "/Users/mbercx/project/super/code/aiida-core/aiida/engine/processes/workchains/workchain.py", line 413, in _on_awaitable_finished
self.resume()
File "/Users/mbercx/.virtualenvs/super/lib/python3.9/site-packages/plumpy/base/state_machine.py", line 98, in transition
result = wrapped(self, *a, **kw)
File "/Users/mbercx/.virtualenvs/super/lib/python3.9/site-packages/plumpy/processes.py", line 1112, in resume
return self._state.resume(*args) # type: ignore
File "/Users/mbercx/.virtualenvs/super/lib/python3.9/site-packages/plumpy/process_states.py", line 339, in resume
self._waiting_future.set_result(value)
asyncio.exceptions.InvalidStateError: invalid state |
Thanks for the report @mbercx . I am pretty sure that the error in the circus log is unrelated to any The second stack trace is problematic however as it actually causes AiiDA processes to except. I am not sure what could cause this. Apparently the value that is getting set on the future is invalid, but annoyingly the exception raised by If you say that you can consistently reproduce it, what would be super useful is to add some print statements to |
This was for the circus-related error. I'll see if I will still bump into the other one. |
This may be of interest: circus-tent/circus#1202 |
Describe the bug
From time to time, I find my processes are no longer updating. When trying to restart the daemon, stopping the daemon fails due to a timeout:
Restarting afterwards is not a problem:
I find no errors in the daemon logs, but in the circus logs I find the following (trimmed for brevity, full error messages log below):
Full Error log
Steps to reproduce
I haven't found a way to consistently reproduce the problem yet, but it seems to occur more often when I am running processes that involve large data transfers. For the event above, a dozen orso calculation jobs had paused due to connection issues, which I then had restarted with
verdi process play -a
.Your environment
sph/fix/6013/verdi-computer-test
branch, it seems. Commit 47cd515, but I've also had it happen when running onmain
.circus
: 0.18.0Additional context
From an offline discussion I know @unkcpz has also run into this issue.
The text was updated successfully, but these errors were encountered: