-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
polling: low performance with CPU accelerator and async streams #368
Comments
I do not know if I understand your problem correctly, could you please elaborate it some more? |
It may be possible to replace the mutex lock within the |
The performance of the test call is ok. The issue is coming that one the master thread (thread which create kernel calls and records events) is highly frequently polling all eventa to find out which event is finished. A clean solution would be asynchron callbacks but this is not supported by cuda, or would be increase the cost to start a kernel. This issue has no high priority but I opened it that we can work on it in the future. |
Asynchronous callbacks are supported via
CPU backends simply start a new thread with the user callback when it is reached. |
The problem with cuda calls back is that this means we need to start a kernel, add a call back add a event. A driver call like adding a event or a kernel call cost around 14us if I remember right. In the case of PIConGPU we start up to 10000 Events per second. If we will implement callback we need to take care of HIP and openACC. |
I will try to implement those callbacks nevertheless. There is an equivalent |
If multiple asynchronous streams will be used on a CPU accelerator and the user program is using a lot of test calls to check the states of events and streams than we lose one cpu core for the polling action.
The overhead can be reduced if we add the possibility to add priorities to a device and/or stream.
This is only a suggestion we need to evaluate the possibilities.
The text was updated successfully, but these errors were encountered: