Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tikv has already be moved to other machine,but drainer still send msg to it #1000

Open
jwongz opened this issue Sep 3, 2020 · 5 comments
Open
Labels

Comments

@jwongz
Copy link

jwongz commented Sep 3, 2020

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.

migrate tikv

  1. What did you expect to see?

normal

  1. What did you see instead?

drainer error


{"log":"[2020/09/03 11:04:23.736 +08:00] [INFO] [client_batch.go:257] [\"batchRecvLoop re-create streaming fail\"] [target=xxx:10000] [error=\"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp xxx:10000: connect: connection refused\\\"\"]\n","stream":"stdout","time":"2020-09-03T03:04:23.736332662Z"}
{"log":"[2020/09/03 11:04:23.782 +08:00] [INFO] [client_batch.go:257] [\"batchRecvLoop re-create streaming fail\"] [target=xxx:10000] [error=\"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp xxx:10000: connect: connection refused\\\"\"]\n","stream":"stdout","time":"2020-09-03T03:04:23.782749447Z"}
{"log":"[2020/09/03 11:04:23.872 +08:00] [INFO] [syncer.go:251] [\"write save point\"] [ts=419195062555246597]\n","stream":"stdout","time":"2020-09-03T03:04:23.872408938Z"}
{"log":"[mysql] 2020/09/03 11:04:23 packets.go:36: read tcp xxxa:32768-\u003exxxa:20076: read: connection reset by peer\n","stream":"stderr","time":"2020-09-03T03:04:23.897010135Z"}
{"log":"[2020/09/03 11:04:23.985 +08:00] [FATAL] [syncer.go:254] [\"save checkpoint failed\"] [ts=419195062555246597] [error=\"query sql failed: replace into tidb_binlog.checkpoint values(6625488485425191173, '{\\\"commitTS\\\":419195062555246597,\\\"ts-map\\\":{}}'): invalid connection\"] [errorVerbose=\"invalid connection\\nquery sql failed: replace into tidb_binlog.checkpoint values(6625488485425191173, '{\\\"commitTS\\\":419195062555246597,\\\"ts-map\\\":{}}')\\ngithub.mirror.nvdadr.com/pingcap/tidb-binlog/drainer/checkpoint.(*MysqlCheckPoint).Save\\n\\t/home/pirlo.wang/go/src/tidb-binlog/drainer/checkpoint/mysql.go:137\\ngithub.mirror.nvdadr.com/pingcap/tidb-binlog/drainer.(*Syncer).savePoint\\n\\t/home/pirlo.wang/go/src/tidb-binlog/drainer/syncer.go:252\\ngithub.mirror.nvdadr.com/pingcap/tidb-binlog/drainer.(*Syncer).handleSuccess\\n\\t/home/pirlo.wang/go/src/tidb-binlog/drainer/syncer.go:226\\ngithub.mirror.nvdadr.com/pingcap/tidb-binlog/drainer.(*Syncer).run.func1\\n\\t/home/pirlo.wang/go/src/tidb-binlog/drainer/syncer.go:270\\nruntime.goexit\\n\\t/usr/local/go/src/runtime/asm_amd64.s:1337\"] [stack=\"github.com/pingcap/log.Fatal\\n\\t/home/pirlo.wang/go/pkg/mod/github.com/pingcap/log@v0.0.0-20190715063458-479153f07ebd/global.go:59\\ngithub.mirror.nvdadr.com/pingcap/tidb-binlog/drainer.(*Syncer).savePoint\\n\\t/home/pirlo.wang/go/src/tidb-binlog/drainer/syncer.go:254\\ngithub.mirror.nvdadr.com/pingcap/tidb-binlog/drainer.(*Syncer).handleSuccess\\n\\t/home/pirlo.wang/go/src/tidb-binlog/drainer/syncer.go:226\\ngithub.mirror.nvdadr.com/pingcap/tidb-binlog/drainer.(*Syncer).run.func1\\n\\t/home/pirlo.wang/go/src/tidb-binlog/drainer/syncer.go:270\"]\n","stream":"stdout","time":"2020-09-03T03:04:23.98599254Z"}}

  1. Please provide the relate downstream type and version of drainer.
    (run drainer -V in terminal to get drainer's version)

v3.0.5

@jwongz jwongz added the type/bug This issue is a bug label Sep 3, 2020
@kennytm
Copy link
Contributor

kennytm commented Sep 3, 2020

PTAL @july2993.

Error caused by invalid connection to the checkpoint database (verifying the roles of the IP addresses). There should probably be a retry loop in (*MysqlCheckPoint).Save.

@july2993
Copy link
Contributor

july2993 commented Sep 3, 2020

PTAL @july2993.

Error caused by invalid connection to the checkpoint database (verifying the roles of the IP addresses). There should probably be a retry loop in (*MysqlCheckPoint).Save.

There's no try now. It just fatal if failed to save checkpoint(default saved at the downstream db).
I don't think the scale-in tikv at upstream is related to this problem

@jwongz
Copy link
Author

jwongz commented Sep 3, 2020

since tikv has be moved for two days,this log is nerver printed

{"log":"[2020/09/03 11:11:00.859 +08:00] [INFO] [syncer.go:251] [\"write save point\"] [ts=419195167884705793]\n","stream":"stdout","time":"2020-09-03T03:11:00.85920493Z"}

and this log print for two days:

{"log":"[2020/09/03 11:04:23.736 +08:00] [INFO] [client_batch.go:257] [\"batchRecvLoop re-create streaming fail\"] [target=xxx:10000] [error=\"rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \\\"transport: Error while dialing dial tcp xxx:10000: connect: connection refused\\\"\"]\n","stream":"stdout","time":"2020-09-03T03:04:23.736332662Z"}

@jwongz
Copy link
Author

jwongz commented Sep 3, 2020

After drainer-server restart, it's ok now

@july2993
Copy link
Contributor

july2993 commented Sep 3, 2020

{"log":"[2020/09/03 11:04:23.736 +08:00] [INFO] [client_batch.go:257] ["batchRecvLoop re-create streaming fail"] [target=xxx:10000] [error="rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = \"transport: Error while dialing dial tcp xxx:10000: connect: connection refused\""]\n","stream":"stdout","time":"2020-09-03T03:04:23.736332662Z"}

This is probably is the bug of tikv client pingcap/tidb#10301.
and is already fixed in the latest version.

But I not sure what it will cause drainer can no longer work forward and save checkpoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants