-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTEST: fixes to the uct sockaddr tests #4520
Conversation
fixes for #4331 |
Mellanox CI: FAILED on 2 of 25 workers (click for details)Note: the logs will be deleted after 05-Dec-2019
|
the failures are #4512 and OOM bot:pipe:retest |
bot:retest |
are all the counters incremented only by one thread? |
Mellanox CI: FAILED on 4 of 25 workers (click for details)Note: the logs will be deleted after 07-Dec-2019
|
in rdmacm - all but the err_count counter (in the iface tests) are incremented by the async thread. |
test/gtest/uct/ib/test_sockaddr.cc
Outdated
@@ -428,6 +433,7 @@ class test_uct_cm_sockaddr : public uct_test { | |||
|
|||
self->m_cm_state |= TEST_CM_STATE_CONNECT_REQUESTED; | |||
self->m_server_recv_req_cnt++; | |||
ucs_memory_cpu_store_fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
queue push should be before counter increment
test/gtest/uct/ib/test_sockaddr.cc
Outdated
self->server_recv_req++; | ||
} | ||
|
||
static ucs_status_t err_handler(void *arg, uct_ep_h ep, ucs_status_t status) | ||
{ | ||
test_uct_sockaddr *self = reinterpret_cast<test_uct_sockaddr*>(arg); | ||
|
||
ucs_memory_cpu_store_fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove, make err_count atomic instead
test/gtest/uct/ib/test_sockaddr.cc
Outdated
@@ -445,6 +451,7 @@ class test_uct_cm_sockaddr : public uct_test { | |||
server_connect_cb(uct_ep_h ep, void *arg, ucs_status_t status) { | |||
test_uct_cm_sockaddr *self = reinterpret_cast<test_uct_cm_sockaddr *>(arg); | |||
self->m_cm_state |= TEST_CM_STATE_SERVER_CONNECTED; | |||
ucs_memory_cpu_store_fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably could be removed
test/gtest/uct/ib/test_sockaddr.cc
Outdated
@@ -456,6 +463,7 @@ class test_uct_cm_sockaddr : public uct_test { | |||
self->m_server->disconnect(ep); | |||
} | |||
self->m_cm_state |= TEST_CM_STATE_SERVER_DISCONNECTED; | |||
ucs_memory_cpu_store_fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
test/gtest/uct/ib/test_sockaddr.cc
Outdated
@@ -477,6 +485,7 @@ class test_uct_cm_sockaddr : public uct_test { | |||
EXPECT_EQ(entity::server_priv_data, | |||
std::string(static_cast<const char *>(remote_data->conn_priv_data))); | |||
self->m_cm_state |= TEST_CM_STATE_CLIENT_CONNECTED; | |||
ucs_memory_cpu_store_fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
test/gtest/uct/ib/test_sockaddr.cc
Outdated
@@ -495,6 +504,7 @@ class test_uct_cm_sockaddr : public uct_test { | |||
} | |||
|
|||
self->m_cm_state |= TEST_CM_STATE_CLIENT_DISCONNECTED; | |||
ucs_memory_cpu_store_fence(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
test/gtest/uct/ib/test_sockaddr.cc
Outdated
ucs::test_time_multiplier(); | ||
|
||
ucs_memory_cpu_load_fence(); | ||
while ((m_delayed_conn_reqs.size() == 0) && (ucs_get_time() < deadline)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while should wait for counter (conn requests) and then load_fence() and access the queue
bot:retest |
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 08-Dec-2019
|
Mellanox CI: FAILED on 4 of 25 workers (click for details)Note: the logs will be deleted after 08-Dec-2019
|
From AZP testing (Tests althca on worker 1): CXX uct/ib/gtest-test_sockaddr.o
In file included from /opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/common/test_helpers.h:11,
from /opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/common/test.h:10,
from /opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/uct/ib/test_sockaddr.cc:7:
/opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/common/gtest.h: In instantiation of ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = int; T2 = volatile unsigned int]’:
/opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/common/gtest.h:18938:23: required from ‘static testing::AssertionResult testing::internal::EqHelper<true>::Compare(const char*, const char*, const T1&, const T2&, typename testing::internal::EnableIf<(! testing::internal::is_pointer<T2>::value)>::type*) [with T1 = int; T2 = volatile unsigned int; typename testing::internal::EnableIf<(! testing::internal::is_pointer<T2>::value)>::type = void]’
/opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/uct/ib/test_sockaddr.cc:170:5: required from here
/opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/common/gtest.h:18862:16: error: comparison of integer expressions of different signedness: ‘const int’ and ‘const volatile unsigned int’ [-Werror=sign-compare]
if (expected == actual) {
~~~~~~~~~^~~~~~~~~
/opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/common/gtest.h: In instantiation of ‘testing::AssertionResult testing::internal::CmpHelperEQ(const char*, const char*, const T1&, const T2&) [with T1 = long unsigned int; T2 = volatile int]’:
/opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/common/gtest.h:18898:23: required from ‘static testing::AssertionResult testing::internal::EqHelper<lhs_is_null_literal>::Compare(const char*, const char*, const T1&, const T2&) [with T1 = long unsigned int; T2 = volatile int; bool lhs_is_null_literal = false]’
/opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/uct/ib/test_sockaddr.cc:555:9: required from here
/opt/azure/agent-03/AZP_WORKSPACE/1/s/contrib/../test/gtest/common/gtest.h:18862:16: error: comparison of integer expressions of different signedness: ‘const long unsigned int’ and ‘const volatile int’ [-Werror=sign-compare]
cc1plus: all warnings being treated as errors
make[3]: *** [uct/ib/gtest-test_sockaddr.o] Error 1 |
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 09-Dec-2019
|
@yosefe is this good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
squash
+ sockaddr over cm - fix the delayed server response test - wait for the server's recv_req_cnt increment. checking the queue's size right after connect() returns isn't correct since connect() may return before the request was added to the queue. + add fences in the tests to make it work correctly on hosts with a weak memory model.
c68c5a3
to
337b26c
Compare
Mellanox CI: PASSED on 25 workers (click for details)Note: the logs will be deleted after 10-Dec-2019
|
bot:pipe:retest |
No description provided.