Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

fix/chain_head: Ensure correct events for finalized branch #13632

Merged
merged 8 commits into from
Mar 22, 2023

Conversation

lexnv
Copy link
Contributor

@lexnv lexnv commented Mar 17, 2023

Issue paritytech/polkadot-sdk#48 has detected a few flaky tests coming form the chainHead.

Error

The error encountered (extracted from here):

--- STDERR:              sc-rpc-spec-v2 chain_head::tests::follow_forks_pruned_block ---
thread 'chain_head::tests::follow_forks_pruned_block' panicked at 'assertion failed: `(left == right)`
  left: `Finalized(Finalized { finalized_block_hashes: ["0x0af997b079e96b7f333bf2fe71cb914ad3eb49355923fd3db96c0cc91243bd7a"], pruned_block_hashes: [] })`,
 right: `BestBlockChanged(BestBlockChanged { best_block_hash: "0x0af997b079e96b7f333bf2fe71cb914ad3eb49355923fd3db96c0cc91243bd7a" })

Investigation

The issue happens on the following code path:

  • the finalized event is triggered before the new_block event

  • finalized branch is responsible for generating NewBlock + BestBlock event

    /// Generates new block events from the given finalized hashes.
    ///
    /// It may be possible that the `Finalized` event fired before the `NewBlock`
    /// event. In that case, for each finalized hash that was not reported yet
    /// generate the `NewBlock` event. For the final finalized hash we must also
    /// generate one `BestBlock` event.
    fn generate_finalized_events(

  • the best_block_cache is set, however the intent was just to verify if we have encountered a gap in

    self.best_block_cache = Some(*hash);
    },
    // This is the first best block event that we generate.
    None => {
    self.best_block_cache = Some(*hash);
    },
    };
    // This is the first time we see this block. Generate the `NewBlock` event; if this is
    // the last block, also generate the `BestBlock` event.
    events.extend(self.generate_import_events(*hash, *parent, true))

  • the generate_import_events() checks if the best block was already encountered to not regenerate the event.

    if block_cache != block_hash {
    self.best_block_cache = Some(block_hash);
    vec![new_block, best_block_event]

Fix

Remove populating the best_block_cache on the finalized branch to allow generate_import_events() to properly generate the BestBlock event for the latest finalized hash.

Testing Done

  • ran cargo test in a loop with stress-ng --cpu 6 --io 4 for 239 times on my machine. Although it gives us some confidence, I've only encountered the Finalized event arriving before NewBlock just a few times in a more intensive test while introducing the code.

Notes

// CC @paritytech/tools-team

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv self-assigned this Mar 17, 2023
@lexnv lexnv added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Mar 17, 2023
@lexnv lexnv mentioned this pull request Mar 17, 2023
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Comment on lines 376 to 395
// Make sure we have not encountered a gap in notifications.
// Note: Let `generate_import_events()` populate the cache.
if let Some(best_block_hash) = self.best_block_cache {
// If the best reported block is a children of the last finalized,
// then we had a gap in notification.
let ancestor = sp_blockchain::lowest_common_ancestor(
&*self.client,
*last_finalized,
best_block_hash,
)?;

// A descendent of the finalized block was already reported
// before the `NewBlock` event containing the finalized block
// is reported.
if ancestor.hash == *last_finalized {
return Err(SubscriptionManagementError::Custom(
"A descendent of the finalized block was already reported".into(),
))
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this piece of code? How do you know that there wasn't any new best block event for the last finalized block?

And why do you have this as part of of the for loop while it doesn't depend on any from the loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey

How do you know that there wasn't any new best block event for the last finalized block?

We depend on the pin_block which returns true if this is the first time we "pin" the block and false if the block was previously pinned. We call pin_block twice for each block: from the import and finalized branches.

for (hash, parent) in finalized_block_hashes.iter().zip(parents) {
// This block is already reported by the import notification.
if !self.sub_handle.pin_block(*hash)? {
continue
}

And why do you have this as part of of the for loop while it doesn't depend on any from the loop?

For any blocks that we have not seen yet we want to generate the NewBlock, except for the last block for which we want to generate the NewBlock + BestBlock events. For the last block, I've added the extra check to make sure we catch the following case:

[Block 0] -  [Block 1] - [Block 2] -  ...
                          ^^^ T0: Block import branch reports block 2 for the first time 

^^^^^^^^^^^^^^^ T1:  Finalized branch reports 0 and 1 for the first time

We could enter this path only if the BlockImport notification is not generated for 0, 1; but somehow we receive the BlockImport for block 2. And at the same time we get Finalized for 0 and 1 (blocks that we have not been reported by BlockImport).

This check may be an overcautious one, let me know what you think of it, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the flow was a little bit hard to follow. I pushed a commit that improves the flow and adds some more docs. I hope that is okay.

return Err(SubscriptionManagementError::BlockNumberAbsent)
};
let Some(parent) = self.client.hash(first_number.saturating_sub(One::one()))? else {
let Some(first_header) = self.client.header(*first_hash)? else {
return Err(SubscriptionManagementError::BlockHashAbsent)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: This sounds like the block hash is absent, but its more HeaderNotFound or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense! I've changed it to BlockHeaderAbsent to make it explicit that we are missing the header from the db, thanks!

client/rpc-spec-v2/src/chain_head/chain_head_follow.rs Outdated Show resolved Hide resolved
bkchr and others added 3 commits March 22, 2023 10:35
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv merged commit d6899f8 into master Mar 22, 2023
@lexnv lexnv deleted the lexnv/fix_chainhead_tests branch March 22, 2023 10:52
breathx pushed a commit to gear-tech/substrate that referenced this pull request Apr 22, 2023
…h#13632)

* chain_head/follow: Ensure correct events for finalized branch

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

* Reenable tests

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

* Do some clean ups and add some more docs

* Fix gramatic

* Update client/rpc-spec-v2/src/chain_head/chain_head_follow.rs

Co-authored-by: Sebastian Kunert <skunert49@gmail.com>

* rpc/chain_head: Introduce error for absent headers

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
nathanwhit pushed a commit to nathanwhit/substrate that referenced this pull request Jul 19, 2023
…h#13632)

* chain_head/follow: Ensure correct events for finalized branch

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

* Reenable tests

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

* Do some clean ups and add some more docs

* Fix gramatic

* Update client/rpc-spec-v2/src/chain_head/chain_head_follow.rs

Co-authored-by: Sebastian Kunert <skunert49@gmail.com>

* rpc/chain_head: Introduce error for absent headers

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Bastian Köcher <info@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Sebastian Kunert <skunert49@gmail.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants