-
Notifications
You must be signed in to change notification settings - Fork 473
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
encountered different devices in metric calculation #1361
Comments
Do you have some piece of minimal code for me? Then I'd be happy to fix the bug:) |
Basically where you bring metrics into the class you have to pass it to the device , so you have a dict in utils_metrics.py where you initialize the MeansAbsoluteError function and basically when you call the function like in forecast.py you would have to do something like MeansAbsoluteError().to(device) i have been trying to figure out the proper entrypoint for this but had to move on to other work. My guess is when you determine if your device is cpu or gpu , and metrics is enabled. Set it there |
Thanks for your tip! It seems like we have found the error. We think the metrics are not properly configured inside the LightningModule. Because in the documentation, they say there is No need to call .to(device) anymore! (https://torchmetrics.readthedocs.io/en/latest/pages/lightning.html), but the metric should be defined inside a LightningModule and we are defining them outside in |
So basically I am doing multi threaded model training runs on a batch dataframe with ThreadPoolExecutor() something like:
|
Hi @webcoderz, I tried to fix the bug and I think it works now but I dont have a computer with a GPU. Would you like to try if this PR fixes the bug? Then I'm happy to merge and do a new release. Thank you! |
Yes will test it tommorow! |
Sorry got wrapped up at work getting a release pushed out will get on this first thing tomorrow! |
LGTM! |
@webcoderz thats great! happy to hear that! I will go ahead, merge and make a new release |
Nice work! it went great😊 this is gonna save a bunch of time for these large gpu workloads I have |
@webcoderz, so happy that I could help you. I actually also just started to train on a really huge dataset and really dont know where to start... |
Sure! cody.l.webb@gmail.com Feel free to reach out anytime! |
This is great, you will get some experience working with big data that will help you understand some of the things that need to be considered here! |
fixed with #1365 |
Prerequisites
If you have the same question but the Answer does not solve your issue, please continue the conversation there.
If you have the same issue but there is a twist to your situation, please add an explanation there.
Describe the bug
when doing multi threaded gpu training sessions the metrics are not passing the tensors from cpu to gpu and am receiving this error:
Encountered different devices in metric calculation . This could be due to the metrics class not being on same device as the input. Instead of MeanAbsoluteError() use MeanAbsoluteError().to(device)
To Reproduce
Steps to reproduce the behavior:
Expected behavior
For metrics to cleanly pass from cpu into GPU. the utils_metric.py only calls the error function to be used (L7), no sort of device tracking .
What actually happens
Describe what happens, and how often it happens.
Screenshots
If applicable, add screenshots and console printouts to help explain your problem.
Environement (please complete the following information):
pip install neuralprophet
] 0.5.4Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: