Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RC and iodemo fixes #71

Merged
merged 4 commits into from
Oct 27, 2020
Merged

RC and iodemo fixes #71

merged 4 commits into from
Oct 27, 2020

Conversation

yosefe
Copy link
Owner

@yosefe yosefe commented Oct 27, 2020

  • Reserve 1 CQ credit for qp-flush NOP, to avoid disabling CQ moderation
  • Fix keepalive on DevX
  • Fix memory leak in IO demo
  • Simplify active connection counting

@yosefe
Copy link
Owner Author

yosefe commented Oct 27, 2020

@evgeny-leksikov pls take a look

@@ -671,9 +673,14 @@ class DemoServer : public P2pDemoCommon {
recv_data(conn, *iov, msg->tr.sn, w);
}

virtual void dispatch_connection_accepted(UcxConnection* conn) {\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls del \

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines 140 to 142
bool add_connection(UcxConnection *conn);

bool remove_connection(UcxConnection *conn);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return value is not handled anywhere now, can be void with new virtual func or maybe assert for debug

virtual void dispatch_connection_error(UcxConnection *conn) {
LOG << "deleting connection with status "
<< ucs_status_string(conn->ucx_status());
--_curr_state.active_conns;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can server be an initiator of disconnect?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no


virtual bool remove_connection(UcxConnection *conn);
// Called when new server connection is accepted
virtual void dispatch_connection_accepted(UcxConnection* conn);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO better to avoid functions in base class which are relevant only for one inheritor, add_conection looks more generic

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add/remove connection was intended to be private helper to manage conn_id hash and not virtual

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

When QP is modified by DevX, the 'state' field may not be updated
Reserve a CQ entry for NOP operation during RC QP flush. Only one is
needed since this is a blocking operation. This removes the need to
disable TX moderation to avoid deadlock during flush due to lack of CQ
resources to post a single NOP (which is needed to gen completions for
unsignaled WQEs)
The client counts incorrectly the number of outstanding ops: when a
connection is closed, _num_sent is decreased, but it should not be,
since read requests also complete with error status and _num_completions
is increased. As a result, the client sends more and more outstanding
operations after every terminated connection, which causes increase in
memory usage.
The fix is to first close the connection, which will restore completions
for read requests, and then restore the remaining credits for write
requests - which will never complete on their own because they are
waiting for write completion io-message.
@yosefe yosefe merged commit aa41625 into integration3 Oct 27, 2020
@yosefe yosefe deleted the topic/rc-and-iodemo-fixes branch October 27, 2020 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants