-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RC_MLX5/IFACE: fixed assert #6046
RC_MLX5/IFACE: fixed assert #6046
Conversation
hoopoepg
commented
Dec 18, 2020
•
edited
Loading
edited
- fixed issue in iface recv prepost when adaptive progress enabled
- fixed assert: rc_mlx5_common.c:128 Assertion `rc_iface->rx.srq.available >= count' failed
- don't try prepost recv twice
d90a5cb
to
c2d9d08
Compare
src/uct/ib/rc/accel/rc_mlx5_iface.c
Outdated
(iface->super.rx.srq.quota > 0)) { /* prepost recvs only if quota available | ||
* (recvs were not preposted before) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- nice catch 👍
- is it fixing the srq avail assertion? can you add this to PR description?
- maybe move the >0 check inside uct_rc_mlx5_iface_common_prepost_recvs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- pls submit to v1.10.x as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, >0 check better to keep here, because uct_rc_mlx5_iface_common_prepost_recvs
is called much more frequently, so check would add additional overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see uct_rc_mlx5_iface_common_prepost_recvs is called only from uct_rc_mlx5_iface_progress_enable and dc iface init, what am i missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, probably good to add ucs_assert(available>0) to uct_rc_mlx5_iface_srq_post_recv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may fault, mixed it with uct_rc_mlx5_iface_srq_post_recv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it fixing the srq avail assertion? can you add this to PR description?
yes, updated PR/commit comment
maybe move the >0 check inside uct_rc_mlx5_iface_common_prepost_recvs?
moved to uct_rc_mlx5_iface_common_prepost_recvs
good to add ucs_assert(available>0) to uct_rc_mlx5_iface_srq_post_recv
added new assert
pls submit to v1.10.x as well
will do backport after merge into master
c2d9d08
to
1e5e1d8
Compare
src/uct/ib/rc/accel/rc_mlx5_common.c
Outdated
uct_rc_mlx5_iface_srq_post_recv(iface); | ||
/* prepost recvs only if quota available | ||
* (recvs were not preposted before) */ | ||
if (iface->super.rx.srq.quota > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: if quota==0 return
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, fixed
src/uct/ib/rc/accel/rc_mlx5_common.c
Outdated
iface->super.rx.srq.quota = 0; | ||
uct_rc_mlx5_iface_srq_post_recv(iface); | ||
/* prepost recvs only if quota available | ||
* (recvs were not preposted before) */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: wrap the line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm... unbuitified :)
@hoopoepg do you mean adaptive progress in the description? |
- fixed issue in iface recv prepost when adaptive routing enabled - fixed assert: rc_mlx5_common.c:128 Assertion `rc_iface->rx.srq.available >= count' failed - don't try prepost recv twice
1e5e1d8
to
daa69c5
Compare
yes, fixed comment |
@hoopoepg can you pls also port to v1.10.x and int3 branches? |
io_demo issue |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |