Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RC_MLX5/IFACE: fixed assert #6046

Merged
merged 1 commit into from
Dec 18, 2020

Conversation

hoopoepg
Copy link
Contributor

@hoopoepg hoopoepg commented Dec 18, 2020

  • fixed issue in iface recv prepost when adaptive progress enabled
  • fixed assert: rc_mlx5_common.c:128 Assertion `rc_iface->rx.srq.available >= count' failed
  • don't try prepost recv twice

@hoopoepg hoopoepg force-pushed the topic/fixed-assert-adaptive-routing branch from d90a5cb to c2d9d08 Compare December 18, 2020 11:32
Comment on lines 220 to 221
(iface->super.rx.srq.quota > 0)) { /* prepost recvs only if quota available
* (recvs were not preposted before) */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. nice catch 👍
  2. is it fixing the srq avail assertion? can you add this to PR description?
  3. maybe move the >0 check inside uct_rc_mlx5_iface_common_prepost_recvs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. pls submit to v1.10.x as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, >0 check better to keep here, because uct_rc_mlx5_iface_common_prepost_recvs is called much more frequently, so check would add additional overhead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see uct_rc_mlx5_iface_common_prepost_recvs is called only from uct_rc_mlx5_iface_progress_enable and dc iface init, what am i missing?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, probably good to add ucs_assert(available>0) to uct_rc_mlx5_iface_srq_post_recv

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may fault, mixed it with uct_rc_mlx5_iface_srq_post_recv

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it fixing the srq avail assertion? can you add this to PR description?

yes, updated PR/commit comment

maybe move the >0 check inside uct_rc_mlx5_iface_common_prepost_recvs?

moved to uct_rc_mlx5_iface_common_prepost_recvs

good to add ucs_assert(available>0) to uct_rc_mlx5_iface_srq_post_recv

added new assert

pls submit to v1.10.x as well

will do backport after merge into master

@hoopoepg hoopoepg force-pushed the topic/fixed-assert-adaptive-routing branch from c2d9d08 to 1e5e1d8 Compare December 18, 2020 12:50
uct_rc_mlx5_iface_srq_post_recv(iface);
/* prepost recvs only if quota available
* (recvs were not preposted before) */
if (iface->super.rx.srq.quota > 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: if quota==0 return

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, fixed

iface->super.rx.srq.quota = 0;
uct_rc_mlx5_iface_srq_post_recv(iface);
/* prepost recvs only if quota available
* (recvs were not preposted before) */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: wrap the line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm... unbuitified :)

@dmitrygx
Copy link
Member

  • fixed issue in iface recv prepost when adaptive routing enabled

@hoopoepg do you mean adaptive progress in the description?

- fixed issue in iface recv prepost when adaptive routing enabled
- fixed assert: rc_mlx5_common.c:128  Assertion `rc_iface->rx.srq.available >= count' failed
- don't try prepost recv twice
@hoopoepg hoopoepg force-pushed the topic/fixed-assert-adaptive-routing branch from 1e5e1d8 to daa69c5 Compare December 18, 2020 13:12
@hoopoepg
Copy link
Contributor Author

@hoopoepg do you mean adaptive progress in the description?

yes, fixed comment

@yosefe
Copy link
Contributor

yosefe commented Dec 18, 2020

@hoopoepg can you pls also port to v1.10.x and int3 branches?

@hoopoepg
Copy link
Contributor Author

io_demo issue

@hoopoepg
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@yosefe yosefe merged commit 423d30f into openucx:master Dec 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants