Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT/API/UCP: Hide slow-path progress elem and refcount in uct_worker. #1583

Merged
merged 3 commits into from
Jun 14, 2017

Conversation

yosefe
Copy link
Contributor

@yosefe yosefe commented Jun 6, 2017

No description provided.

This commit does some preparations in UCT for the change in callback
queue API:
+ Block async context outside of callback queue calls.
+ Keep callback reference count in a handle in uct_base_iface_t/
+ Instead of exposing slow-path callback queue element in UCT API, hide
  it inside uct worker code. This will allocate the element internally
  and release when finished.
@yosefe yosefe force-pushed the topic/uct-api-no-slow-elem branch from 03d8b3c to 58bf758 Compare June 6, 2017 20:27
@mellanox-github
Copy link
Contributor

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/1814/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/3754/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor Author

yosefe commented Jun 7, 2017

@brminich @shamisp pls review

@shamisp
Copy link
Contributor

shamisp commented Jun 7, 2017

adding @bbenton for API part

Copy link
Contributor

@brminich brminich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused with namings. So now uct_worker_progress_register_safe is used for adding to slow path? Is the intention that user should not add to fast path at all?

@@ -403,8 +403,12 @@ static void uct_mm_iface_recv_messages(uct_mm_iface_t *iface)
ret = recvfrom(iface->signal_fd, &sig, sizeof(sig), 0, NULL, 0);
if (ret == sizeof(sig)) {
ucs_debug("mm_iface %p: got connect message", iface);
ucs_callbackq_add_safe(&iface->super.worker->progress_q,
uct_mm_iface_progress, iface);
if (iface->super.prog.refcount++ == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why uct_worker_progress_register is not used instead?

Copy link
Contributor Author

@yosefe yosefe Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we need to use "safe" mode here - it's called from async thread

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but ucs_callbackq_add_safe is the same as ucs_callbackq_add. Also UCS_ASYNC_BLOCK is called in
uct_worker_progress_register

Copy link
Contributor Author

@yosefe yosefe Jun 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. ucs_callbackq_add_safe is currently the same as ucs_callbackq_add, but the user of callbackq should not take advantage of that knowldge, since according to API _safe version should be used if calling from another thread
  2. ASYNC_BLOCK is not needed from the async context itself

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this will be changed to calling uct_worker_progress_register_safe with some flag in the next PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the goal of this PR is mainly to make next PR fit into 500 lines

@yosefe
Copy link
Contributor Author

yosefe commented Jun 7, 2017

@brminich this PR allows users register only to slow-path. After next PR it would be possible to register to fast-path by passing flags.

Copy link
Contributor

@brminich brminich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@MattBBaker
Copy link
Contributor

bot:ornl:retest

@yosefe
Copy link
Contributor Author

yosefe commented Jun 13, 2017

@bbenton @shamisp can you pls review?

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/1858/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/3826/ for details (Mellanox internal link).

@shamisp
Copy link
Contributor

shamisp commented Jun 13, 2017

@MattBBaker did u test it on cray ?

@MattBBaker
Copy link
Contributor

@shamisp Builds and runs fine on Cray.

*
* @note This operation could potentially be slow.
* @param [in] worker Handle to worker.
* @param [in] func Pointer to callback function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointer to the callback function

* @note This operation could potentially be slow.
* @param [in] worker Handle to worker.
* @param [in] func Pointer to callback function.
* @param [in] arg Argument to the function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument for the callback function

@bbenton
Copy link
Contributor

bbenton commented Jun 13, 2017

@yosefe,:q
I still see the following problems in uct.h:

  • uct_worker_slowpath_progress_register, rather than uct_worker_progress_register_safe, is still referenced in the description of uct_worker_progress_unregister_safe()
  • uct_worker_progress_register is referenced in the description of uct_worker_progress_register_safe, yet it is no longer an exposed function, right?

Am I missing something?

@yosefe yosefe force-pushed the topic/uct-api-no-slow-elem branch from 3ec6b75 to 6eed080 Compare June 14, 2017 10:24
@yosefe yosefe force-pushed the topic/uct-api-no-slow-elem branch from 6eed080 to e290cb0 Compare June 14, 2017 10:27
@yosefe
Copy link
Contributor Author

yosefe commented Jun 14, 2017

@bbenton @shamisp fixed the doc comments

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/1862/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/3830/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/3831/ for details (Mellanox internal link).

@bbenton
Copy link
Contributor

bbenton commented Jun 14, 2017

👍 for Doc/API

@shamisp
Copy link
Contributor

shamisp commented Jun 14, 2017

@yosefe - it is ready to go

@yosefe yosefe merged commit 44ba878 into openucx:master Jun 14, 2017
@yosefe yosefe deleted the topic/uct-api-no-slow-elem branch June 14, 2017 15:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants