You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I remove the map_location=torch.device('cpu'), I get the error RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU. which is expected.
When I add map_location=torch.device('cpu'), the module is loaded correctly, but the data module returns the error TypeError: pytorch_lightning.core.saving._load_from_checkpoint() got multiple values for keyword argument 'map_location'.
The source code of the DataModule confirms that the map_location argument is indeed set to None in the _load_from_checkpoint function call within the load_from_checkpoint method. This is the root cause of the issue.
The proposed solution would involve modifying the load_from_checkpoint method of the DataModule to accept map_location as an argument and pass it to the _load_from_checkpoint function. This would allow the user to specify the device to which the storage should be mapped when loading the checkpoint.
Bug description
It seems impossible to properly load a DataModule from a CPU-only machine when it has been checkpointed from a GPU-enabled machine.
What version are you seeing the problem on?
v2.0
How to reproduce the bug
Here is a minimal reproducible example of the bug I am facing (based on the "hello world" examples of pytorch lightning webpage):
torch.cuda.is_available()
returnsTrue
):torch.cuda.is_available()
returnsFalse
):When I remove the
map_location=torch.device('cpu')
, I get the errorRuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
which is expected.When I add
map_location=torch.device('cpu')
, the module is loaded correctly, but the data module returns the errorTypeError: pytorch_lightning.core.saving._load_from_checkpoint() got multiple values for keyword argument 'map_location'
.After checking the source code of method
LightningDataModule.load_from_checkpoint
(see: https://lightning.ai/docs/pytorch/stable/_modules/lightning/pytorch/core/datamodule.html#LightningDataModule.load_from_checkpoint), I see that the problem is coming from the fact that themap_location
argument is already set toNone
(which causes the error):Would it be possible to add
map_location
as an argument toLightningDataModule.load_from_checkpoint
to solve this bug ?Error messages and logs
Environment
Current environment
More info
No response
cc @awaelchli
The text was updated successfully, but these errors were encountered: