Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/API/TEST: Add non-blocking endpoint flush and use it for RMA tests #1912

Merged
merged 3 commits into from
Oct 31, 2017

Conversation

yosefe
Copy link
Contributor

@yosefe yosefe commented Oct 13, 2017

Fixes #1641

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4815/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor Author

yosefe commented Oct 13, 2017

00:39:25 [ RUN      ] dcx/test_ucp_perf.envelope/0
00:39:26 [     INFO ]                tag latency : 1.536 usec
00:39:27 [     INFO ]            tag iov latency : 4.529 usec
00:39:28 [     INFO ]                     tag mr : 2.083 Mpps
00:39:30 [     INFO ]                tag sync mr : 1.827 Mpps
00:39:31 [     INFO ]                tag wild mr : 1.597 Mpps
00:39:32 [1507930771.748390] [hpc-test-node2:21162:0]          flush.c:336  UCX  WARN  flush failed: No resources are available to initiate the operation
00:39:32 [     INFO ]                     tag bw : 992.233 MB/sec
00:39:32 [     INFO ]         tag bw_zcopy_multi : 902.404 MB/sec
00:39:33 [     INFO ]                put latency : 2.281 usec
00:39:33 [     INFO ]                   put rate : 5.266 Mpps
00:39:34 [     INFO ]                     put bw : 4232.287 MB/sec
00:39:34 [     INFO ]                get latency : 2.175 usec
00:39:35 [     INFO ]                     get bw : 3634.934 MB/sec
00:39:36 [     INFO ]            atomic add rate : 1.008 Mpps
00:39:36 [     INFO ]        atomic fadd latency : 2.069 usec
00:39:37 [     INFO ]        atomic swap latency : 1.982 usec
00:39:37 [     INFO ]       atomic cswap latency : 2.111 usec
00:39:37 /scrap/jenkins/scrap/workspace/hpc-ucx-pr/label/hpc-test-node2/worker/3/contrib/../test/gtest/common/test.cc:244: Failure
00:39:37 Failed
00:39:37 Got 1 warnings during the test
00:39:37 
[  FAILED  ] dcx/test_ucp_perf.envelope/0, where GetParam() = \dc_mlx5 (11925 ms)

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2810/ for details.

The blocking version of endpoint flush may potentially cause a deadlock
because it does not progress communications on anything except the
current worker. Introduce a non-blocking flush and use it for unit
tests.
@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2816/ for details.

@mellanox-github
Copy link
Contributor

Test FAILed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4820/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2820/ for details.

@yosefe
Copy link
Contributor Author

yosefe commented Oct 14, 2017

bot:bgate:retest

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4823/ for details (Mellanox internal link).

@swx-jenkins1
Copy link

Test FAILed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2821/ for details.

@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2827/ for details.

@shamisp
Copy link
Contributor

shamisp commented Oct 17, 2017

We need an example how block flush can be implemented, since the non-blocking calls almost always we be used in a blocking way. otherwise, it looks good

@yosefe
Copy link
Contributor Author

yosefe commented Oct 18, 2017

@shamisp there is example in flush.c (blocking flush is now implemented over non-blocking one), and also in the unit tests

@shamisp
Copy link
Contributor

shamisp commented Oct 19, 2017

I think we need to put an example that is exposed in ucp.h , so it goes to the spec,

@shamisp
Copy link
Contributor

shamisp commented Oct 19, 2017

@bbenton please take a look. My only comment for now - we have some good examples in the spec to cover the concept. People not used to work with non-blocking flush.

+ Add API example for how to implement blocking flush.
+ Fix return status from blocking flush compatibility functions.
+ Use non-blocking flush in hello_world example.
@swx-jenkins1
Copy link

Test PASSed.
See http://bgate.mellanox.com/jenkins/job/gh-ucx-pr/2864/ for details.

@mellanox-github
Copy link
Contributor

Test PASSed.
See http://hpc-master.lab.mtl.com:8080/job/hpc-ucx-pr/4885/ for details (Mellanox internal link).

@yosefe
Copy link
Contributor Author

yosefe commented Oct 21, 2017

@shamisp fixed

@shamisp
Copy link
Contributor

shamisp commented Oct 21, 2017 via email

@yosefe
Copy link
Contributor Author

yosefe commented Oct 25, 2017

@bbenton can you pls take a look?

@shamisp
Copy link
Contributor

shamisp commented Oct 27, 2017

@bbenton - can you please take a look

@gmegan
Copy link

gmegan commented Oct 30, 2017

+1

@yosefe yosefe merged commit c9151b1 into openucx:master Oct 31, 2017
@yosefe yosefe deleted the topic/ucp-nb-flush branch October 31, 2017 07:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants