-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Invalid justification provided #4678
Comments
CC @andresilva |
Could you provide some more information? What version of substrate are you using? Recently we added more logging for this, if you're running with a reasonably up-to-date master it would be useful to collect logs with |
I've to substrate version: d2c4b0d
|
@andresilva I have seen this on my Centrifuge flint node that was running rc3 while the other nodes running alpha-2. I have restarted with |
So what was the resolution to this? Is there a fix in substrate master? |
Yes, there were multiple fixes targeting this issue. If you encounter it again with a recent |
That is great! At the moment we are at a 2.x branch so I can't test it. What is the issue number so we can track the commits and try to backport them to 2.0.0. Are there any plans to provide this fix to a 2.x branch? |
I think this happens again. I'm on rococo-v1. Recently this error happens again. Can be found on our mainnet and testnet. Once this happen, the peers drop down to 0. Very dangerous. Restart can fix this. But this should not be happen. Any idea? @andresilva
|
Interesting. Even the peers count is 0. After a while, it can sync some blocks.
|
#7640 introduced a migration that had a bug which borked the existing justifications in the database. It was later fixed by #8489. But any node that ran the code from #7640 will have broken its justifications and will be serving invalid justifications. It could be another issue but this one is a prime suspect. |
Okay let me update my testnet to #8489. |
I think if the case was the borked migration then it wouldn't be fixed with restarting the node though (unless this means you connect to some different peer). Is this easily reproducible? I will try to have a look into this in the next days. |
PM you at element. I can provide some information |
Hi, we have a justification error. Our bootnodes seem to work ok(sync best and finalized blocks) but our rpc nodes are stuck and won't sync. We try to clear db and resync only from bootnodes but we get a justification error at block 104960.
Any idea how to fix this? We have lot of nodes in network. |
The logs you posted don't show anything related to finality. Could you have a look at my earlier comments about the borked db migration to see if they apply to your node? Can you replicate the issue if you don't sync just from your bootnodes but instead sync from other nodes in the network (if it works this would be an indication that your bootnodes' db is borked)? There is a pending issue with justifications that can lead to the |
thanks. We don't have a problem with finality, the network continues to produce and finalize blocks, but some nodes, including our rpc and archive nodes are stuck at different finalized blocks than the rest of the network. When we try to re-sync our rpc nodes we get Our bootnodes were first validators before opening staking. Now they are not selected to validator set but they are still running as validator |
I've looked at logs once again and I sent you the wrong part, sorry. Full log is too big for github so if there is something specific what should I search for, please let me know. Thanks for help |
@martinfridrich I synced your chain to look into the problem. The issue is related to forced changes so it's not related to the issue that was previously found here. From my analysis block #105469 created a forced changed which stated that the best finalized block was #103166 when in fact the best finalized block was at least #104960, this caused different nodes to have inconsistent views of what the actual set id is which leads to justifications not being able to be verified. Any node syncing the chain will now not be able to validate finality moving forward. I can't figure out everything about the current state as the nodes that are actually finalizing have a different view of what happened that I can't see now. Of the top of my head I don't know of any easy way to fix this, there's nothing for me to fix here as you used a critical API (forcing authority set changes) with wrong parameters. Here's some logs that show the issue:
You won't see the last line in your own logs as I added it myself, but this was the parameter that you passed to the |
Hey guys,
|
Sometimes, the nodes report
Invalid justification provided
then it will fall behind the latest node few blocks. And I've to restart those node to (re)participate in validating.The text was updated successfully, but these errors were encountered: