Skip to content

Commit

Permalink
UCT/RC/MLX5: Fix keepalive send condition with CQ moderation
Browse files Browse the repository at this point in the history
- Send FC grant message (which is also used for keepalive) with CQ
  signaling enabled.
- Skip sending a keepalive message only if there are no unsignaled
  sends: if there are unsignaled sends, they could be completed already,
  so skipping a keepalive message could fail to detect a dead
  connection.
  • Loading branch information
yosefe committed Sep 12, 2021
1 parent f9da7a1 commit 3712383
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 3 deletions.
2 changes: 1 addition & 1 deletion src/uct/ib/rc/accel/rc_mlx5_ep.c
Original file line number Diff line number Diff line change
Expand Up @@ -589,7 +589,7 @@ ucs_status_t uct_rc_mlx5_ep_fc_ctrl(uct_ep_t *tl_ep, unsigned op,
NULL, 0,
UCT_RC_EP_FC_PURE_GRANT, 0, 0,
0, 0,
NULL, NULL, 0, 0,
NULL, NULL, 0, MLX5_WQE_CTRL_CQ_UPDATE,
INT_MAX);
return UCS_OK;
}
Expand Down
7 changes: 5 additions & 2 deletions src/uct/ib/rc/accel/rc_mlx5_iface.c
Original file line number Diff line number Diff line change
Expand Up @@ -173,10 +173,13 @@ uct_rc_mlx5_common_ka_progress(uct_rc_mlx5_iface_common_t *iface)

ucs_spin_lock(&iface->super.ep_list_lock);
ucs_list_for_each(ep, &iface->super.ep_list, super.list) {
if (ep->super.txqp.available < ep->tx.wq.bb_max) {
/* have outstanding operations */
if ((ep->super.txqp.available < ep->tx.wq.bb_max) &&
(ep->super.txqp.unsignaled == 0)) {
/* Have outstanding uncompleted operations - no need to send
keepalive message */
continue;
}

ucs_trace("send keepalive grant on ep %p", ep);
uct_rc_ep_fc_send_grant(&ep->super);
}
Expand Down

0 comments on commit 3712383

Please sign in to comment.