Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GO/TESTS: Use serialized thread mode #9955

Merged
merged 1 commit into from
Jun 17, 2024
Merged

Conversation

tvegas1
Copy link
Contributor

@tvegas1 tvegas1 commented Jun 13, 2024

What

Seems goroutines can end up being scheduled on different OS threads.

Why ?

When used in single thread mode, UCX must be always running on the same thread.

@brminich brminich requested a review from rakhmets June 13, 2024 07:26
@rakhmets
Copy link
Collaborator

rakhmets commented Jun 13, 2024

What issue is being solved by these changes?
I guess you need runtime.LockOSThread(). Because this is the only way to force Go to use the same thread.
Please take a look at #9581.

@tvegas1
Copy link
Contributor Author

tvegas1 commented Jun 13, 2024

What issue is being solved by these changes?

The goroutine is being run on different OS threads, which ends up operating UCX worker on different OS threads.

Doing so can cause corruption. For instance, in single thread mode for IB, we use bf_post instead of bf_post_mt which skips using a store fence when sending data. I am not aware of any other difference apart this added barrier.

We then added #9918 to enforce single thread usage, which is now triggering issue in go tests.

I guess you need runtime.LockOSThread(). Because this is the only way to force Go to use the same thread. Please take a look at #9581.

Agree there is similarity, but here we don't even need to be as stringent. Having the worker being freely non-concurrently operated from multiple OS threads is fine (the goroutine), as long as we create it in serialized mode.

@rakhmets rakhmets enabled auto-merge June 14, 2024 10:07
@brminich
Copy link
Contributor

/azp run

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@brminich
Copy link
Contributor

wire-compat failure is a known issue related to a specific server config

@brminich
Copy link
Contributor

merging this PR to unblock CI

@brminich brminich disabled auto-merge June 17, 2024 07:01
@brminich brminich merged commit 0bda7b9 into openucx:master Jun 17, 2024
154 of 156 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants