Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Block record is missing from the pruning window #12635

Closed
ggwpez opened this issue Nov 7, 2022 · 9 comments
Closed

Block record is missing from the pruning window #12635

ggwpez opened this issue Nov 7, 2022 · 9 comments
Labels
I3-bug The node fails to follow expected behavior.

Comments

@ggwpez
Copy link
Member

ggwpez commented Nov 7, 2022

The node crashes when restarted after a warp-sync. It only happens for ParityDB. RocksDB works fine. cc @cheme

Reproduce

  1. Start a Polkadot node v0.9.31 or later: polkadot --sync warp --db paritydb -d new-data-dir
  2. CTRL-C after warp sync when old blocks download starts (⏩ Block history ....)
  3. Restart the node with the same command

It prints this and dies:

# Many of these lines, seemingly depending on how long you let it download "Block history" after the warp sync:
ERROR tokio-runtime-worker state-db: Block record is missing from the pruning window, block number 0
…
ERROR tokio-runtime-worker state-db: Block record is missing from the pruning window, block number 0
ERROR tokio-runtime-worker afg: GRANDPA voter error: could not complete a round on disk: State Database error: Block record is missing from the pruning window
ERROR tokio-runtime-worker sc_service::task_manager: Essential task `grandpa-voter` failed. Shutting down service.
ERROR tokio-runtime-worker state-db: Block record is missing from the pruning window, block number 0
Error:
   0: Other: Essential task failed.

Quote from @davxy

My suspect here is that there is some data corruption within the DB. Is also strange that when using parity-db and I send a SIGINT (aka CTRL-C) the process takes ~15 sec to stop (<<< is this always happened?)

I also noticed that since a few versions it takes much longer to shut down the node. Not sure if this is because of me using ParityDB or something else.

@ggwpez ggwpez added the I3-bug The node fails to follow expected behavior. label Nov 7, 2022
@bkchr
Copy link
Member

bkchr commented Nov 7, 2022

CC @arkpar

@cheme
Copy link
Contributor

cheme commented Nov 7, 2022

The ~15 secs are because paritydb now flush wal when exiting. The final operation is big (commit all state), and might be related. From other feedback, this last operation also makes memory grow a lot which is an issue to warp synch on low memory machine.
So I already wanted to check if import state could be split in multiple commits.
The error in itself may be something else (maybe something like after state import if block 0 is not imported we are in inconsistent state) that show more easily on paritydb. But since it is quite long to warpsync, test may take some time.

@ggwpez
Copy link
Member Author

ggwpez commented Nov 7, 2022

But since it is quite long to warpsync, test may take some time.

It takes one to 15 minutes for me, depending on hardware and Bandwidth.

@cheme
Copy link
Contributor

cheme commented Nov 7, 2022

just took me 5 minute, quite long (probably could run faster if synching against a single other local full node).
From a few test, I did observe another failure if I ctrl+c during "importing state" step (same with rocksdb).
For parity-db I did observe the error everytime, but actually even without exiting the synch, after warp synch node is stuck on

2022-11-07 22:08:54 Warp sync is complete (596 MiB), restarting block sync.    
2022-11-07 22:08:55 ✨ Imported #12825895 (0xf4bb…40c4)    
2022-11-07 22:08:55 Error occurred while computing tree_route from 0x91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3 to 0xf4bbd2749212935d9dc7803fe61364f890a3762eb885c5bad2b3b1e3dcc040c4: Blockchain error: UnknownBlock: Header was not found in the database: 0x8ec77811749a5f6d17212f0a4bd7662090160b000fab8269517fdc6b54ef6db7    
2022-11-07 22:08:57 ⏩ Block history, #8512 (16 peers), best: #12825895 (0xf4bb…40c4), finalized #12825872 (0xf61a…edea), ⬇ 939.5kiB/s ⬆ 21.4kiB/s    
2022-11-07 22:09:02 Error occurred while computing tree_route from 0x91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3 to 0xbe0e6c32d74dc29de09c60418273ff902acd11681e92c03f93ab66d728180798: Blockchain error: UnknownBlock: Header was not found in the database: 0x8ec77811749a5f6d17212f0a4bd7662090160b000fab8269517fdc6b54ef6db7    
2022-11-07 22:09:02 ✨ Imported #12825896 (0xbe0e…0798)    
2022-11-07 22:09:02 ⏩ Block history, #22528 (13 peers), best: #12825896 (0xbe0e…0798), finalized #12825872 (0xf61a…edea), ⬇ 1.0MiB/s ⬆ 30.9kiB/s    
2022-11-07 22:09:04 Error occurred while computing tree_route from 0x91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3 to 0x0228c52935e04245ba9b467a50457afcc88ee1b29a7f0b93fa1b730504d6a0ce: Blockchain error: UnknownBlock: Header was not found in the database: 0x8ec77811749a5f6d17212f0a4bd7662090160b000fab8269517fdc6b54ef6db7    
2022-11-07 22:09:07 Error occurred while computing tree_route from 0x91b171bb158e2d3848fa23a9f1c25182fb8e20313b2c1eb49219da7a70ce90c3 to 0xcf98047bc378ca9b1424296ead2bcf41c83cc0cc8678ea44d70fcb8f25785d0a: Blockchain error: UnknownBlock: Header was not found in the database: 0x8ec77811749a5f6d17212f0a4bd7662090160b000fab8269517fdc6b54ef6db7    
2022-11-07

which I don't see with rocksdb.
Edit: just got the message with rocksdb, restarting the node seems to fix this

@ggwpez
Copy link
Member Author

ggwpez commented Nov 7, 2022

For parity-db I did observe the error everytime, but actually even without exiting the synch, after warp synch node is stuck on

The tree_root error should be fixed by #12632 but seems to be independent of this issue.

@cheme
Copy link
Contributor

cheme commented Nov 7, 2022

Just run with parity-db log, nothing wrong (last queued commit up to 8803 got processed and enact, wal log0 size of state ald log1 growing when syncing are both flushed and delete properly on exit).
So I don't feel like this is related to db being corrupted, but will test tomorrow with parity-db master (or latest publish).

@arkpar
Copy link
Member

arkpar commented Nov 7, 2022

This should be fixed with #12239
Waiting for review @bkchr

@cheme
Copy link
Contributor

cheme commented Nov 8, 2022

%s/should be/is
(I had all in place to test so I did (only one try though))

@ggwpez
Copy link
Member Author

ggwpez commented Nov 8, 2022

Fixed by the two MRs mentioned above.

@ggwpez ggwpez closed this as completed Nov 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I3-bug The node fails to follow expected behavior.
Projects
None yet
Development

No branches or pull requests

4 participants