Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/EP: Add eager multi-fragment overhead #7967

Merged
merged 1 commit into from
Mar 5, 2022

Conversation

Artemy-Mellanox
Copy link
Contributor

No description provided.


if (eager) {
params->bw += 1.0 / ((1.0 / bw) + (iface_attr->overhead /
iface_attr->cap.am.max_bcopy));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it possible max_bcopy would be 0?

bw = ucp_tl_iface_bandwidth(context, &iface_attr->bandwidth);

if (eager) {
params->bw += 1.0 / ((1.0 / bw) + (iface_attr->overhead /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add comment:
/* eager protocol has overhead for each fragment */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use
overhead = send_post_overhead + send_pre_overhead (from estimate_perf)

@@ -1749,7 +1751,14 @@ static void ucp_ep_config_calc_params(ucp_worker_h worker,
}
}

params->bw += ucp_tl_iface_bandwidth(context, &iface_attr->bandwidth);
bw = ucp_tl_iface_bandwidth(context, &iface_attr->bandwidth);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can remove space line

@@ -360,13 +360,21 @@ void test_ucp_peer_failure::do_test(size_t msg_size, int pre_msg_count,
EXPECT_NE(UCS_OK, m_err_status);

if (UCS_PTR_IS_PTR(sreq)) {
/* The request may either succeed or fail, even though the data is
ucs_status_t status;
/* TODO update comment
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. why needed? if m_err_count!=0, i would expect the request to also complete
  2. update comment
  3. can we use request_wait()?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discussed offline:

  1. use request_wait
  2. update comment: "If rendezvous protocol is used, the m_err_count is increased on the receiver side, so the send request may not complete immediately"

src/ucp/core/ucp_ep.c Outdated Show resolved Hide resolved
src/ucp/core/ucp_ep.c Outdated Show resolved Hide resolved
@Artemy-Mellanox Artemy-Mellanox force-pushed the topic/fuj_perf_old branch 2 times, most recently from df01e95 to 987b782 Compare March 2, 2022 15:51
Comment on lines 1767 to 1768
status = uct_iface_estimate_perf(wiface->iface, &perf_attr);
ucs_assert(status == UCS_OK);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fail config_init

yosefe
yosefe previously approved these changes Mar 3, 2022
@@ -1812,17 +1826,23 @@ static size_t ucp_ep_config_calc_rndv_thresh(ucp_worker_t *worker,
(2 * eager_zcopy.overhead) + rndv.overhead) -
eager_zcopy.reg_overhead - eager_zcopy.overhead;

denumerator = eager_zcopy.reg_growth +
denominator = eager_zcopy.reg_growth +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah..

@yosefe
Copy link
Contributor

yosefe commented Mar 4, 2022

@Artemy-Mellanox can you pls squash?

@yosefe yosefe closed this Mar 4, 2022
@yosefe yosefe reopened this Mar 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants