Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for endpoint flush #3054

Merged
merged 3 commits into from
Nov 26, 2018
Merged

Fixes for endpoint flush #3054

merged 3 commits into from
Nov 26, 2018

Conversation

yosefe
Copy link
Contributor

@yosefe yosefe commented Nov 22, 2018

No description provided.

@yosefe yosefe added the Bugfix label Nov 22, 2018
@yosefe yosefe added this to the v1.5.0 milestone Nov 22, 2018
@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5629/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8282/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5631/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8293/ for details (Mellanox internal link).

We must not update cached tail in uct_mm_ep_flush() if there are any
pending elements. We may get new send resources but not use them, so
flush could return UCS_OK while there pending requests.

Fixes openucx#3052
- Ignore remote completions in case of forced flush (CLOSE_MODE_CANCEL)

- The UCP_EP_FLAG_FLUSH_STATE_VALID flag can't be used because it's not
  valid in release mode. Instead, use the UCP_EP_FLAG_DEST_EP flag as an
  indirect indication that we may have some operations which wait for
  software rma/amo completion.

- Fix missing initializion of flush state in case of client/server
  connection establishment with p2p lanes and *without* pre-request.
@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5632/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8295/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8296/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor Author

yosefe commented Nov 24, 2018

@brminich @evgeny-leksikov pls take a look
@evgeny-leksikov pls note the small fix in client/server wireup flow with removing p2p lanes, i hope i am not missing something here..

ucs_queue_push(&flush_state->reqs, &req->send.flush.queue);
ucs_trace_req("added flush request %p to ep remote completion queue"
" with sn %d", req, req->send.flush.cmpl_sn);
/* All pending requires were sent, so 'send_sn' value is up-to-date */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: requires->requests

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/5639/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8303/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8308/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8310/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8313/ for details (Mellanox internal link).

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/8314/ for details (Mellanox internal link).

@yosefe yosefe merged commit b8f6b86 into openucx:master Nov 26, 2018
@yosefe yosefe deleted the topic/flush-fixes branch November 26, 2018 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants