Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix data race when joining the task runner #1493

Merged
merged 8 commits into from
Jun 13, 2023
Merged

Fix data race when joining the task runner #1493

merged 8 commits into from
Jun 13, 2023

Conversation

git-hulk
Copy link
Member

This may close #1448

Currently, server->Join will join the task runner before the cron thread, so it may have the data race since the cron thread may access the task runner as well. We can avoid this by removing the task runner join which will do in the task runner destructor.

Currently, server->Join will join the task runner before the cron
thread, so it may have the data race since the cron thread may acess the
task runner as well. We can avoid this by removing the task runner join
which will do in the task runner destructor.
torwig
torwig previously approved these changes Jun 11, 2023
Copy link
Contributor

@torwig torwig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
Sounds legitimate to me.

@git-hulk
Copy link
Member Author

@torwig Thanks for your quick review.

@PragmaTwice
Copy link
Member

Could you specify the location of another Join call of the task runner?

@git-hulk
Copy link
Member Author

Could you specify the location of another Join call of the task runner?

Sure, the TaskRunner destructor will join as well: https://github.com/apache/incubator-kvrocks/blob/unstable/src/common/task_runner.h#L49

PragmaTwice
PragmaTwice previously approved these changes Jun 12, 2023
Copy link
Member

@PragmaTwice PragmaTwice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Although a dtor with a long time waiting may cause some misunderstanding.

@git-hulk git-hulk dismissed stale reviews from PragmaTwice and torwig via e1e3614 June 12, 2023 08:56
@git-hulk
Copy link
Member Author

LGTM. Although a dtor with a long time waiting may cause some misunderstanding.

Yes, I think we can remove the Cancel and Join in dtor since the Server::Stop will cancel the task runner first. Can take a look again. @PragmaTwice @torwig

@torwig
Copy link
Contributor

torwig commented Jun 12, 2023

@git-hulk Yes, Server::Stop will cancel and Server::Join will join all threads.
Additionally, here https://github.com/apache/incubator-kvrocks/blob/unstable/src/common/task_runner.cc#L48 should we stop joining threads when the first call to util::ThreadJoin or just log an error and continue joining other threads?

@git-hulk
Copy link
Member Author

git-hulk commented Jun 12, 2023

@git-hulk Yes, Server::Stop will cancel and Server::Join will join all threads. Additionally, here https://github.com/apache/incubator-kvrocks/blob/unstable/src/common/task_runner.cc#L48 should we stop joining threads when the first call to util::ThreadJoin or just log an error and continue joining other threads?

oooh, my bad. I didn't see that, updated.

src/server/server.cc Outdated Show resolved Hide resolved
@torwig
Copy link
Contributor

torwig commented Jun 12, 2023

@git-hulk My idea was to change the loop to the following:

for (auto &thread : threads_) {
  if (auto s = util::ThreadJoin(thread); !s) {
    LOG(WARNING) << "Failed to join thread: " << s.Msg();
    continue;
  }
}

What do you think about that? @PragmaTwice What is your opinion on that?

@git-hulk
Copy link
Member Author

git-hulk commented Jun 13, 2023

@git-hulk My idea was to change the loop to the following:

for (auto &thread : threads_) {
  if (auto s = util::ThreadJoin(thread); !s) {
    LOG(WARNING) << "Failed to join thread: " << s.Msg();
    continue;
  }
}

What do you think about that? @PragmaTwice What is your opinion on that?

Yes, it's good to only log an error when failing to join since other places like the worker thread only print an error.

@PragmaTwice
Copy link
Member

@git-hulk My idea was to change the loop to the following:

for (auto &thread : threads_) {
  if (auto s = util::ThreadJoin(thread); !s) {
    LOG(WARNING) << "Failed to join thread: " << s.Msg();
    continue;
  }
}

What do you think about that? @PragmaTwice What is your opinion on that?

Seems good to me.

torwig
torwig previously approved these changes Jun 13, 2023
@git-hulk git-hulk changed the title Fix duplicately join the task runner Fix data race when joining the task runner Jun 13, 2023
@git-hulk
Copy link
Member Author

Thanks all, merging...

@git-hulk git-hulk merged commit e89af52 into apache:unstable Jun 13, 2023
jihuayu pushed a commit to jihuayu/incubator-kvrocks that referenced this pull request Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data race in TBB::concurrent_queue when running integrate tests
3 participants