Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hot region scheduler panic #5005

Closed
rleungx opened this issue May 20, 2022 · 0 comments
Closed

Hot region scheduler panic #5005

rleungx opened this issue May 20, 2022 · 0 comments

Comments

@rleungx
Copy link
Member

rleungx commented May 20, 2022

Bug Report

What did you do?

Transfer PD leader.

What did you expect to see?

The PD is working as usual.

What did you see instead?

[2022/05/20 02:06:56.884 +00:00] [FATAL] [log.go:294] [panic] [recover="\"invalid memory address or nil pointer dereference\""] [stack="[github.com/tikv/pd/pkg/logutil.LogPanic\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/pkg/logutil/log.go:294\nruntime.gopanic\n\t/usr/local/go/src/runtime/panic.go:965\nruntime.panicmem\n\t/usr/local/go/src/runtime/panic.go:212\nruntime.sigpanic\n\t/usr/local/go/src/runtime/signal_unix.go:734\ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*balanceSolver).buildOperator\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:1061\ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*balanceSolver).solve\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:481\ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*hotScheduler).balanceHotReadRegions\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:265\ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*hotScheduler).dispatch\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:167\ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*hotScheduler).Schedule\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:152\ngithub.mirror.nvdadr.com/tikv/pd/server/cluster.(*scheduleController).Schedule\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/cluster/coordinator.go:873\ngithub.mirror.nvdadr.com/tikv/pd/server/cluster.(*coordinator).runScheduler\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/cluster/coordinator.go:790](http://github.com/tikv/pd/pkg/logutil.LogPanic/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/pkg/logutil/log.go:294/nruntime.gopanic/n/t/usr/local/go/src/runtime/panic.go:965/nruntime.panicmem/n/t/usr/local/go/src/runtime/panic.go:212/nruntime.sigpanic/n/t/usr/local/go/src/runtime/signal_unix.go:734/ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*balanceSolver).buildOperator/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:1061/ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*balanceSolver).solve/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:481/ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*hotScheduler).balanceHotReadRegions/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:265/ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*hotScheduler).dispatch/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:167/ngithub.mirror.nvdadr.com/tikv/pd/server/schedulers.(*hotScheduler).Schedule/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/schedulers/hot_region.go:152/ngithub.mirror.nvdadr.com/tikv/pd/server/cluster.(*scheduleController).Schedule/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/cluster/coordinator.go:873/ngithub.mirror.nvdadr.com/tikv/pd/server/cluster.(*coordinator).runScheduler/n/t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/server/cluster/coordinator.go:790)"]

The code is firstly introduced by #3657. After v5.2.0, the store can report the region statistics for the hot region schedule. Once a PD restarts and becomes the leader immediately, the region could be elected as a hot region due to the store statistics. But at that time, the region's heartbeat has not been reported yet, so the leader of that hot region will be nil which causes panic in the hot region scheduler.

What version of PD are you using (pd-server -V)?

v5.4.1

@rleungx rleungx added the type/bug The issue is confirmed as a bug. label May 20, 2022
@nolouch nolouch added the cherry-pick-approved Cherry pick PR approved by release team. label May 25, 2022
ti-chi-bot added a commit that referenced this issue May 26, 2022
…5004)

close #5005

Signed-off-by: Ryan Leung <rleungx@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue May 26, 2022
close tikv#5005

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue May 26, 2022
close tikv#5005

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot added a commit that referenced this issue May 30, 2022
…5004) (#5040)

ref #5004, close #5005

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: Ryan Leung <rleungx@gmail.com>

Co-authored-by: Ryan Leung <rleungx@gmail.com>
ti-chi-bot added a commit that referenced this issue Jun 13, 2022
…5004) (#5039)

ref #5004, close #5005

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: Ryan Leung <rleungx@gmail.com>

Co-authored-by: Ryan Leung <rleungx@gmail.com>
@rleungx rleungx removed the cherry-pick-approved Cherry pick PR approved by release team. label Jun 24, 2022
@rleungx rleungx closed this as completed Jun 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants