-
-
Notifications
You must be signed in to change notification settings - Fork 717
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scheduler crashes in SSHCluster in 2023.3.2 but not in 2023.3.1 #7724
Comments
Rolling this back sorts the issue: #7631 |
cc @milesgranger @jacobtomlinson for visibility |
I did a little more digging and the crash happens somewhere in here, specifically the template format. distributed/distributed/utils.py Lines 1252 to 1254 in f102357
Replacing it to return a constant string avoids the crash. |
I've figured out what is going on here but I don't know how to fix it in dask. I have the following environment variables set: DASK_DISTRIBUTED__DASHBOARD__LINK='{JUPYTERHUB_EXTERNAL_BASE_URL}{JUPYTERHUB_SERVICE_PREFIX}proxy/{port}/status' This is somehow making it's way over to the SSHCluster (I'm assuming via dask config serialization) The issue is those environment variables (JUPYTERHUB_EXTERNAL_BASE_URL, JUPYTERHUB_SERVICE_PREFIX) are not available in the SSH session since they are set in the profile so the template.format is failing:
I understand how to get the correct scheduler link manually. I'd prefer if this situation doesn't cause the scheduler to crash and maybe just falls back on it's old behavior if the link can't be crafted. PS. These errors are not being propagated back to the process that started the cluster which has made debugging this much harder. |
Thanks for taking the time to dig into this. It sounds like there are two things going on here. First is that when The other part is can we make it so that if distributed/distributed/scheduler.py Line 3873 in 78a926d
|
Indeed this would be the best solution. |
When I initially read this, I didn't totally understand what you meant by "misconfigured" here. As I understand it, the problem is that the link includes an environment variable that exists only on the host and not on the cluster. Thus, these would be incorrect... export DASK_DISTRIBUTED__DASHBOARD__LINK="{JUPYTERHUB_EXTERNAL_BASE_URL}{JUPYTERHUB_SERVICE_PREFIX}proxy/{port}/status"
# or
export DASK_DISTRIBUTED__DASHBOARD__LINK="{JUPYTERHUB_SERVICE_PREFIX}proxy/{port}/status" ...and this would be correct: export DASK_DISTRIBUTED__DASHBOARD__LINK="proxy/{port}/status" However, the correct link doesn't work. Suppose I have a JupyterHub deployment and I access my notebook server at:
Setting my
This is incorrect due to the inclusion of The recommendation in Dask documentation of How should users configure the |
I think this question is separate from the bug highlighted here. Could you open a new issue for this? |
Good idea, and done - see #7736. |
Describe the issue: Attempting to use the SSHCluster does not work in 2023.3.2 because the scheduler exits early with an exit code of 1
When rolling back to 2023.3.1 the scheduler starts sucessfully:
Minimal Complete Verifiable Example:
Anything else we need to know?: Full repro here:
Environment:
The text was updated successfully, but these errors were encountered: