-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UCP/UCT/RCACHE: API to return rcache usage information #126
Conversation
@brminich @alinask @evgeny-leksikov can you pls take a look? |
src/ucp/api/ucp.h
Outdated
* a callback then data was allocated, so if UCS_INPROGRESS is | ||
* returned from the callback, the data parameter will persist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: unrelated
src/ucp/api/ucp.h
Outdated
* Messages with a specific id. This callback is called whenever an Active | ||
* Message that was sent from the remote peer by @ref ucp_am_send_nb is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: unrelated
src/ucp/api/ucp.h
Outdated
* received on this worker. | ||
* | ||
* @param [in] worker UCP worker on which to set the Active Message | ||
* @param [in] worker UCP worker on which to set the Active Message |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: unrelated
src/ucp/api/ucp.h
Outdated
@@ -2433,7 +2452,7 @@ ucs_status_t ucp_worker_set_am_handler(ucp_worker_h worker, uint16_t id, | |||
* of the Active Message. | |||
* @param [in] count Number of elements to send. | |||
* @param [in] datatype Datatype descriptor for the elements in the buffer. | |||
* @param [in] cb Callback that is invoked upon completion of the | |||
* @param [in] cb Callback that is invoked upon completion of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: unrelated
UCS_CPU_SET(c, dst); | ||
} | ||
} | ||
memcpy(dst, src, sizeof(*dst)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it improve something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes: before the fix, we went over every set bit, and set it in 'dst', and this code took >90% of time when calling ucp_context_query
in a loop. after the fix, we just bulk-copy, and the overhead of ucp_context_query
is reduced.
src/ucp/api/ucp.h
Outdated
UCP_ATTR_FIELD_REQUEST_SIZE = UCS_BIT(0), /**< UCP request size */ | ||
UCP_ATTR_FIELD_THREAD_MODE = UCS_BIT(1) /**< UCP context thread flag */ | ||
UCP_ATTR_FIELD_THREAD_MODE = UCS_BIT(1), /**< UCP context thread flag */ | ||
UCP_ATTR_FIELD_NUM_PINNED_REGIONS = UCS_BIT(2), /**< Current pinned regions count */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: alignment
for (md_index = 0; md_index < context->num_mds; ++md_index) { | ||
uct_md_query(context->tl_mds[md_index].md, &md_attr); | ||
UCP_SET_ATTR_FIELD(attr, num_pinned_regions, | ||
UCP_ATTR_FIELD_NUM_PINNED_REGIONS, | ||
+= md_attr.rcache_attr.num_regions); | ||
UCP_SET_ATTR_FIELD(attr, num_pinned_bytes, | ||
UCP_ATTR_FIELD_NUM_PINNED_BYTES, | ||
+= md_attr.rcache_attr.total_size); | ||
UCP_SET_ATTR_FIELD(attr, num_pinned_evictions, | ||
UCP_ATTR_FIELD_NUM_PINNED_EVICTIONS, | ||
+= md_attr.rcache_attr.num_evictions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe break up the loop to 3 ones to minimize number of branches? on other hand, this approach implements better memory locality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it can be more complicated code since need to save all md_query results in an array, but if we set each field individually we can reuse the UCP_SET_ATTR_FIELD macro
rcache_attr->num_regions = rcache->num_regions; | ||
rcache_attr->total_size = rcache->total_size; | ||
rcache_attr->num_evictions = rcache->num_evictions; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe define ucs_rcache_attr_t
inside rcache
for counters and do memcpy
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd prefer word-by-word copy so it will be atomic (to read rcache fields without a lock and get a self-consistent value)
334ec29
to
17a2255
Compare
17a2255
to
7c395d6
Compare
Why
API to return rcache eviction rate per context, to allow application adjust its buffer usage
How