Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCP/UCT/RCACHE: API to return rcache usage information #126

Merged
merged 1 commit into from
Jan 20, 2021

Conversation

yosefe
Copy link
Owner

@yosefe yosefe commented Jan 19, 2021

Why

API to return rcache eviction rate per context, to allow application adjust its buffer usage

How

  • Return the information from rcache to UCT md
  • UCP context collects (sum) counters from all UCT memory domains
  • Optimize uct_ib_md_query() so it could be called periodically without a performance impact

@yosefe
Copy link
Owner Author

yosefe commented Jan 19, 2021

@brminich @alinask @evgeny-leksikov can you pls take a look?

Comment on lines 463 to 464
* a callback then data was allocated, so if UCS_INPROGRESS is
* returned from the callback, the data parameter will persist
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: unrelated

Comment on lines 2416 to 2417
* Messages with a specific id. This callback is called whenever an Active
* Message that was sent from the remote peer by @ref ucp_am_send_nb is
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: unrelated

* received on this worker.
*
* @param [in] worker UCP worker on which to set the Active Message
* @param [in] worker UCP worker on which to set the Active Message
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: unrelated

@@ -2433,7 +2452,7 @@ ucs_status_t ucp_worker_set_am_handler(ucp_worker_h worker, uint16_t id,
* of the Active Message.
* @param [in] count Number of elements to send.
* @param [in] datatype Datatype descriptor for the elements in the buffer.
* @param [in] cb Callback that is invoked upon completion of the
* @param [in] cb Callback that is invoked upon completion of the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: unrelated

UCS_CPU_SET(c, dst);
}
}
memcpy(dst, src, sizeof(*dst));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it improve something?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes: before the fix, we went over every set bit, and set it in 'dst', and this code took >90% of time when calling ucp_context_query in a loop. after the fix, we just bulk-copy, and the overhead of ucp_context_query is reduced.

Comment on lines 328 to 330
UCP_ATTR_FIELD_REQUEST_SIZE = UCS_BIT(0), /**< UCP request size */
UCP_ATTR_FIELD_THREAD_MODE = UCS_BIT(1) /**< UCP context thread flag */
UCP_ATTR_FIELD_THREAD_MODE = UCS_BIT(1), /**< UCP context thread flag */
UCP_ATTR_FIELD_NUM_PINNED_REGIONS = UCS_BIT(2), /**< Current pinned regions count */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: alignment

Comment on lines +1658 to +1668
for (md_index = 0; md_index < context->num_mds; ++md_index) {
uct_md_query(context->tl_mds[md_index].md, &md_attr);
UCP_SET_ATTR_FIELD(attr, num_pinned_regions,
UCP_ATTR_FIELD_NUM_PINNED_REGIONS,
+= md_attr.rcache_attr.num_regions);
UCP_SET_ATTR_FIELD(attr, num_pinned_bytes,
UCP_ATTR_FIELD_NUM_PINNED_BYTES,
+= md_attr.rcache_attr.total_size);
UCP_SET_ATTR_FIELD(attr, num_pinned_evictions,
UCP_ATTR_FIELD_NUM_PINNED_EVICTIONS,
+= md_attr.rcache_attr.num_evictions);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe break up the loop to 3 ones to minimize number of branches? on other hand, this approach implements better memory locality

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it can be more complicated code since need to save all md_query results in an array, but if we set each field individually we can reuse the UCP_SET_ATTR_FIELD macro

Comment on lines +752 to +754
rcache_attr->num_regions = rcache->num_regions;
rcache_attr->total_size = rcache->total_size;
rcache_attr->num_evictions = rcache->num_evictions;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe define ucs_rcache_attr_t inside rcache for counters and do memcpy here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd prefer word-by-word copy so it will be atomic (to read rcache fields without a lock and get a self-consistent value)

@yosefe yosefe merged commit 311cdd0 into integration3 Jan 20, 2021
@yosefe yosefe deleted the topic/rcache-info-api branch January 20, 2021 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants