Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL][DOC] Add initial spec draft for new default devices and queues #15382

Open
wants to merge 8 commits into
base: sycl
Choose a base branch
from

Conversation

jbrodman
Copy link
Contributor

This is a draft for a new extension to add the notion of a default device and default queues for devices.

It also extends the new launch mechanisms with queue-less forms that utilize the default device's default queue.

Signed-off-by: James Brodman <james.brodman@intel.com>
@jbrodman jbrodman requested a review from a team as a code owner September 12, 2024 18:35
Comment on lines 113 to 114
_Returns:_ The current default device. The initial device returned is the device
returned by the default device selector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the meaning of "initial device" could be clarified here. Maybe say something like:

Suggested change
_Returns:_ The current default device. The initial device returned is the device
returned by the default device selector.
_Returns:_ The current default device, as set by `set_current_device()`.
If `set_current_device()` has not been called, returns the device
returned by the default device selector.

Also, a little bit of bikeshedding: it seems a little weird to me that the functions are called get_current_device() but the text describes it as the "current default device". I think we should consistently use one or two words to describe this concept. I think I'm leaning towards get_default_device() and set_default_device(). That naming is more aligned with get_default_context() and get_default_queue().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I had some reason I went sour on "default" but forgot what it is.


_Returns:_ Nothing

_Remarks:_ Sets the current default device to `dev`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_Remarks:_ Sets the current default device to `dev`.
_Effects:_ Sets the current default device to `dev`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions:

  • Does this set the default for this thread, or all threads?
  • Does this change the device returned by later calls to the default selector? (Should it?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned as an open at the bottom.

Comment on lines 165 to 168
sycl::malloc_device(numBytes,
get_current_device(),
get_current_device().get_platform().ext_oneapi_get_default_context(),
propList)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially a little surprised by this, because I usually use the overloads accepting a queue.

Is the device's default queue guaranteed to be associated with the default context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the intent is that a device's default queue will use the device's platform's default context.

----
!====

_Returns:_ The default queue for the device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we clarify here that the default queue must be associated with the default context?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

Comment on lines 976 to 977
. [UNRESOLVED] Should the currrent device be global or should we also support a per-thread
device?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the CUDA versions are always per-thread: https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html?highlight=cudaSetDevice#device-selection

I can see this behavior being useful for something like an OpenMP+SYCL application -- you could call set_default_device() on each OpenMP thread to round-robin assign your devices to threads, then not worry about it again.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts here that are relevant if we decided that each thread has its own default device:

  • We should clarify the behavior when a new thread is created. Does the new thread inherit the parent thread's default device? If not, is the initial default device for the new thread guaranteed to be the device selected by the default selector? Or, is the initial default device for a new thread unspecified?

  • We have discussed in the past the possibility of expanding the set of APIs that can be called from a host task. It is possible that we will allow a host task to call some of the APIs in this extension. If that happens, what guarantees does the application have about the default device in a host task?

jbrodman and others added 5 commits September 16, 2024 12:21
…sciidoc

Co-authored-by: John Pennycook <john.pennycook@intel.com>
…sciidoc

Co-authored-by: John Pennycook <john.pennycook@intel.com>
…sciidoc

Co-authored-by: John Pennycook <john.pennycook@intel.com>
…sciidoc

Co-authored-by: John Pennycook <john.pennycook@intel.com>
…sciidoc

Co-authored-by: John Pennycook <john.pennycook@intel.com>
[source,c++]
----
namespace sycl {
class queue {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class queue {
class device {


[source, c++]
----
submit_with_event(get_current_device().ext_oneapi_get_default_queue(), cgf)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
submit_with_event(get_current_device().ext_oneapi_get_default_queue(), cgf)
submit(get_current_device().ext_oneapi_get_default_queue(), cgf)

----
namespace sycl::ext::oneapi::experimental {

void memcpy(void* dest, const void* src, size_t numBytes);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I predict that many users will be confused, thinking this is a synchronous operation. For example, syclex::malloc_shared and syclex::memcpy are both memory operations. How are users supposed to know that the first is synchronous while the second is asynchronous?

Maybe we should add the suffix "_async" to all the functions that are asynchronous?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how I feel about the idea of an "_async" suffix at this point. I'm a little worried that explicitly calling out that some functions are asynchronous would introduce the opposite problem to what you're trying to solve: because functions like submit and parallel_for aren't called submit_async and parallel_for_async, a new developer might conclude they're synchronous.

(Aside: I would be okay with introducing an "_async" suffix if that's the direction C++ takes for its asynchronous STL functions, but then we'd need to roll out the suffix everywhere and deprecate the previous function names.)

Another potential fix would be to say that all these default queue shorthands are blocking...? I guess then there's a confusion that sycl::memcpy(dest, src, numBytes) is synchronous but sycl::memcpy(q, dest, src, numBytes) is asynchronous...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried that explicitly calling out that some functions are asynchronous would introduce the opposite problem to what you're trying to solve: because functions like submit and parallel_for aren't called submit_async and parallel_for_async, a new developer might conclude they're synchronous.

I was thinking that we could add "_async" to all of the asynchronous functions in the sycl_ext_oneapi_enqueue_functions extension, but we would leave the functions in SYCL 2020 named as they are now.

----
namespace sycl::ext::oneapi::experimental {

(1) void* sycl::malloc_device(size_t numBytes, const property_list& propList = {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our style is to put these numeric callouts to the right of the API synopsis. In addition, insert extra spaces to line them up vertically within a synopsis code block. See here for an example.

Comment on lines 976 to 977
. [UNRESOLVED] Should the currrent device be global or should we also support a per-thread
device?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts here that are relevant if we decided that each thread has its own default device:

  • We should clarify the behavior when a new thread is created. Does the new thread inherit the parent thread's default device? If not, is the initial default device for the new thread guaranteed to be the device selected by the default selector? Or, is the initial default device for a new thread unspecified?

  • We have discussed in the past the possibility of expanding the set of APIs that can be called from a host task. It is possible that we will allow a host task to call some of the APIs in this extension. If that happens, what guarantees does the application have about the default device in a host task?

Signed-off-by: James Brodman <james.brodman@intel.com>
Signed-off-by: James Brodman <james.brodman@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants