Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidb panic when processing insert select query #39611

Closed
gengliqi opened this issue Dec 5, 2022 · 4 comments · Fixed by #39621 or #39741
Closed

tidb panic when processing insert select query #39611

gengliqi opened this issue Dec 5, 2022 · 4 comments · Fixed by #39621 or #39741
Labels
affects-6.5 severity/critical sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.

Comments

@gengliqi
Copy link
Contributor

gengliqi commented Dec 5, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

1. Minimal reproduce step (Required)

Use TPC-H 100 database.
Create a table lineitem1 whose schema is the same as the lineitem table.
Run explain analyze insert into lineitem1 select * from lineitem limit 10000000;

By the way, when I ran explain analyze insert into lineitem1 select * from lineitem limit 1000000;, the result is ok.

2. What did you expect to see? (Required)

A result of explain analyze.

3. What did you see instead (Required)

ERROR 1105 (HY000): runtime error: slice bounds out of range [-20:]
The panic log in TiDB log is:

panic: cannot cleanup staging buffer

goroutine 15244 [running]:
github.com/tikv/client-go/v2/internal/unionstore.(*MemDB).Cleanup(0xc06cb8a960, 0x1)
        /go/pkg/mod/github.com/tikv/client-go/v2@v2.0.3-0.20221129032117-857772dd0907/internal/unionstore/memdb.go:145 +0x192
github.com/pingcap/tidb/store/driver/txn.(*memBuffer).Cleanup(0x1?, 0x1d?)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/store/driver/txn/unionstore_driver.go:74 +0x1c
github.com/pingcap/tidb/session.(*LazyTxn).cleanupStmtBuf(0xc004fabb90)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/txn.go:148 +0x75
github.com/pingcap/tidb/session.(*LazyTxn).cleanup(0xc004fabb90)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/txn.go:458 +0x1e
github.com/pingcap/tidb/session.(*LazyTxn).reset(0x0?)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/txn.go:453 +0x1e
github.com/pingcap/tidb/session.(*LazyTxn).Rollback(0xc004fabb90)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/txn.go:419 +0xb4
github.com/pingcap/tidb/session.(*session).RollbackTxn(0xc004fabb80, {0x4f459b0, 0xc02cf822d0})
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:1123 +0x1ce
github.com/pingcap/tidb/session.(*session).Close(0xc004fabb80)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/session/session.go:2584 +0x265
github.com/pingcap/tidb/server.(*TiDBContext).Close(0xc004f42930)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/driver_tidb.go:268 +0xb7
github.com/pingcap/tidb/server.closeConn(0xc0051d8a80, 0xc0048e3d70?)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:357 +0x1dc
github.com/pingcap/tidb/server.(*clientConn).Close(0xc0051d8a80)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/conn.go:341 +0x89
github.com/pingcap/tidb/server.(*Server).onConn.func3()
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:601 +0x2f
github.com/pingcap/tidb/server.(*Server).onConn(0xc000de3000, 0xc0051d8a80)
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:641 +0xbb4
created by github.com/pingcap/tidb/server.(*Server).startNetworkListener
        /home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:482 +0x5ca

And the tidb log said this query exceeded memory quota.

[2022/12/05 11:40:01.673 +08:00] [WARN] [expensivequery.go:118] ["memory exceeds quota"] [cost_time=60.196304905s] [cop_time=0.005335916s] [process_time=9.453s] [wait_time=0.006s] [request_count=173] [total_keys=2988045] [process_keys=2987872] [num_cop_tasks=173] [process_avg_time=0.054641618s] [process_p90_time=0.153s] [process_max_time=0.177s] [process_max_addr=172.16.4.7:8264] [wait_avg_time=0.000034682s] [wait_p90_time=0s] [wait_max_time=0.001s] [wait_max_addr=172.16.4.4:8264] [stats=lineitem:437825958657654785] [conn_id=4930212337970839957] [user=root] [database=tpch_100] [table_ids="[75]"] [txn_start_ts=437835934320820226] [mem_max="1182678133 Bytes (1.10 GB)"] [sql="explain analyze insert into lineitem1 select * from lineitem limit 10000000"]

4. What is your TiDB version? (Required)

The latest master.

@gengliqi gengliqi added the type/bug The issue is confirmed as a bug. label Dec 5, 2022
@gengliqi
Copy link
Contributor Author

gengliqi commented Dec 5, 2022

/cc @ekexium

@ti-chi-bot ti-chi-bot added may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.0 may-affects-6.1 may-affects-6.2 may-affects-6.3 may-affects-6.4 labels Dec 5, 2022
@ChenPeng2013 ChenPeng2013 added the sig/transaction SIG:Transaction label Dec 5, 2022
@ekexium ekexium added affects-6.5 and removed may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-6.0 may-affects-6.1 may-affects-6.2 may-affects-6.3 may-affects-6.4 labels Dec 5, 2022
@ekexium
Copy link
Contributor

ekexium commented Dec 5, 2022

The sequence of events that leads to the panic is:
membuffer.set -> enlarge -> onMemChange -> memory tracker decides to panic -> deferred releasing current staging buffer in AddRecord -> revert to last checkpoint

When reverting to last checkpoint, memdb iterates over the nodes backwards, here is why it panics:
The latest block of the arena is in an intermediate state: it is just allocated but no data has been written yet. So its length is 0. While during the iteration we assume the length cannot be 0.

@aytrack
Copy link
Contributor

aytrack commented Dec 7, 2022

can reproduce it with the case after the fix pr, reopen.

drop table if exists t;
create table t(a int, index idx(a));
insert into t values (1), (2), (3);
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;

set tidb_mem_quota_query=1073741824;
insert /*+ memory_quota(1 MB) */ into t (select * from t);

@ekexium
Copy link
Contributor

ekexium commented Dec 7, 2022

can reproduce it with the case after the fix pr, reopen.

drop table if exists t;
create table t(a int, index idx(a));
insert into t values (1), (2), (3);
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;
insert into t select * from t;

set tidb_mem_quota_query=1073741824;
insert /*+ memory_quota(1 MB) */ into t (select * from t);

It's directly caused by the panicking leaving memdb in an inconsistent state. I've filed a PR to fix it.
But I think there are more risks in the current memory trakcing mechanism, which a panic may leave any component that was assumed should not panic in inconsistent states. cc @XuHuaiyu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 severity/critical sig/transaction SIG:Transaction type/bug The issue is confirmed as a bug.
Projects
None yet
5 participants