Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/CORE: Reduce size of generic EP extension for further usage #5848

Merged
merged 1 commit into from
Nov 5, 2020

Conversation

dmitrygx
Copy link
Member

What

Reduce the size of generic EP extension for further usage

Why ?

Generic EP extension will be used by #5821 to have ucs_hlist_head_t for tracking UCP requests.

How ?

Introduce a pointer to a control extension that is placed in the generic extension. Control extension includes:

  • local and remote IDs
  • error callback
  • union for close_req and listener
    other fields are in generic extension, since they are could not be moved to a control extension (e.g. we have to have a mechanism to translate flush_state or conn_match to generic extension, or even to EP)

@dmitrygx
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dmitrygx
Copy link
Member Author

@hoopoepg @brminich @evgeny-leksikov could you review pls?

@dmitrygx
Copy link
Member Author

@yosefe could you review pls?

Comment on lines 108 to 110
ucp_ep_ext_gen(ep)->control_ext = ucs_calloc(
1, sizeof(ucp_ep_ext_control_t),
"ep_control_ext");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 102 to 108
ep->cfg_index = UCP_WORKER_CFG_INDEX_NULL;
ep->worker = worker;
ep->am_lane = UCP_NULL_LANE;
ep->flags = 0;
ep->conn_sn = UCP_EP_MATCH_CONN_SN_MAX;
ucp_ep_ext_gen(ep)->user_data = NULL;
ucp_ep_ext_gen(ep)->control_ext = ucs_calloc(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can reduce number of spaces

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

UCS_STATIC_ASSERT(sizeof(ucp_ep_ext_gen(ep)->ep_match) >=
sizeof(ucp_ep_ext_gen(ep)->flush_state));
UCS_STATIC_ASSERT(sizeof(ucp_ep_ext_control(ep)->listener) ==
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it required?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not needed
removed

/* Endpoint match context and remote completion status are mutually exclusive,
* since remote completions are counted only after the endpoint is already
* matched to a remote peer.
/* Endpoint match context is mutually exclusive with:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this comment relevant now? match is not here and they are not in the union anymore

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, should remove it
and use the old one

Comment on lines 225 to 228
ucs_queue_head_init(&flush_state->reqs);
flush_state->send_sn = 0;
flush_state->cmpl_sn = 0;
ucs_queue_head_init(&flush_state->reqs);
ep->flags |= UCP_EP_FLAG_FLUSH_STATE_VALID;
ep->flags |= UCP_EP_FLAG_FLUSH_STATE_VALID;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@dmitrygx
Copy link
Member Author

@yosefe could you review pls?

@dmitrygx
Copy link
Member Author

the failure is #5856

@dmitrygx
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dmitrygx
Copy link
Member Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@brminich
Copy link
Contributor

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dmitrygx
Copy link
Member Author

dmitrygx commented Nov 2, 2020

bot:pipe:retest

@dmitrygx
Copy link
Member Author

dmitrygx commented Nov 2, 2020

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

* Endpoint extension for control data path
*/
typedef struct {
ucp_ep_ids_t ids; /* Local and remote IDs */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra pointer deref for rndv/am protocols?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that remote_id is used for AMO/RMA/Stream. Is it used for rndv/am as well?
looks like we have to have it in ucp_ep_ext_gen_t

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, it was a double pointer deref before this PR, since IDs were allocated on a heap
ucp_ep_ext_gen_t* -> ucp_ep_ids_t*
with this PR we have:
ucp_ep_ext_gen_t* -> ucp_ep_ext_control_t* and offset to the [0] or [1] ID

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but if we would like to avoid it
we need to allocate 3 strides instead of having a pointer to control extension in general extension + we could move ucp_ep_match_elem_t field there (it is on general extension now, since we have to have an access to EP in UCP/EP_MATCH code)
wdyt?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move ep id's to a stride extension and all the rest to the heap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decoupled local and remote IDs as dicussed

@dmitrygx
Copy link
Member Author

dmitrygx commented Nov 4, 2020

bot:pipe:retest

@dmitrygx
Copy link
Member Author

dmitrygx commented Nov 5, 2020

@yosefe could you review pls?

@dmitrygx
Copy link
Member Author

dmitrygx commented Nov 5, 2020

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dmitrygx
Copy link
Member Author

dmitrygx commented Nov 5, 2020

@yosefe do we need to restart IO demo and RTD somehow?

@yosefe
Copy link
Contributor

yosefe commented Nov 5, 2020

@yosefe do we need to restart IO demo and RTD somehow?

no need

@yosefe
Copy link
Contributor

yosefe commented Nov 5, 2020

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@dmitrygx
Copy link
Member Author

dmitrygx commented Nov 5, 2020

failure is #5875
bot:pipe:retest

@dmitrygx
Copy link
Member Author

dmitrygx commented Nov 5, 2020

IO demo and RTD failures are unrelated
@yosefe ok to merge?

@yosefe yosefe merged commit d669d54 into openucx:master Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants