Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

MMR Leaf Data: off-by-one error in beefy_next_authority_set #11797

Closed
Tracked by #1636
acatangiu opened this issue Jul 7, 2022 · 17 comments
Closed
Tracked by #1636

MMR Leaf Data: off-by-one error in beefy_next_authority_set #11797

acatangiu opened this issue Jul 7, 2022 · 17 comments
Assignees
Labels
I3-bug The node fails to follow expected behavior. U1-asap No need to stop dead in your tracks, however issue should be addressed as soon as possible.

Comments

@acatangiu
Copy link
Contributor

acatangiu commented Jul 7, 2022

A given Leaf with MMR 0-based leaf index N-1 is constructed and added to MMR during construction of block N here.

The Leaf data/contents can be seen populated here and it consists of:

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
    next_auth_set: <next_auth_set_of<N>>,
}

While this is not a show-stopper - (light)clients can simply handle this in their code - it introduces cognitive complexity and code logic corner cases.

We could avoid this and make it easier for users by changing Leaf<N-1> contents to:

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
-   next_auth_set: <next_auth_set_of<N>>,
+   next_auth_set: <next_auth_set_of<N-1>>,
}
@acatangiu acatangiu added I3-bug The node fails to follow expected behavior. U2-some_time_soon Issue is worth doing soon. E5-breaksapi labels Jul 7, 2022
@acatangiu
Copy link
Contributor Author

AFAICT an easy fix is to move Mmr: pallet_mmr before pallet_session so that it builds its leaf before session is rotated.

But I would prefer a fix contained in pallet_mmr itself since above is prone to developer error when adding pallet_mmr to their runtime.

@Lederstrumpf
Copy link
Contributor

Lederstrumpf commented Jul 7, 2022

AFAICT an easy fix is to move Mmr: pallet_mmr before pallet_session so that it builds its leaf before session is rotated.

So since session is rotated prior to leaf construction currently, next_auth_set: <next_auth_set_of<N>> (i.e. Pallet::<T>::beefy_next_authorities()) can be distinct on forks of depth 1?

@vgeddes
Copy link
Contributor

vgeddes commented Jul 7, 2022

We haven't really been affected by this off-by-one issue while developing our light client (or we just worked around it).

The proposed change may affect how our light client detects authority handovers. But I have a feeling there shouldn't be any impact. If there is we can update our code.

Some background on our light client: It receives updates like the following at the start of every beefy session (Assume a session length of 10 blocks). Each update contains the beefy commitment for block N (first block of session) and the MMR leaf for block N-1 (last block of the previous session).

Update:
  BeefyCommitment(block=11) { validatorSetId: 1, ... }
  MmrLeaf(block=10) { nextValidatorSetId: 2, ...}

Update:
  BeefyCommitment(block=21) { validatorSetId: 2, ... }
  MmrLeaf(block=20) { nextvalidatorSetId: 3, ... }

Update:
  BeefyCommitment(block=31) { validatorSetId: 3, ... }
  MmrLeaf(block=30) { nextvalidatorSetId: 4, ... }

The light client knows it can perform an authority handover when the latest update contains a leaf.nextValidatorSetId that is greater than that of the previously observed value (See code here).

@acatangiu
Copy link
Contributor Author

acatangiu commented Jul 7, 2022

AFAICT an easy fix is to move Mmr: pallet_mmr before pallet_session so that it builds its leaf before session is rotated.

So since session is rotated prior to leaf construction currently, next_auth_set: <next_auth_set_of<N>> (i.e. Pallet::<T>::beefy_next_authorities()) can be distinct on forks of depth 1?

There is no issue for forks in either option (current behavior & suggested change) because the validator set is identical for blocks at height N (or N+1, ..., N+m) across all forks.

The issue is more about usability in the sense that the BEEFY payload for block N is a MMR root where latest latest leaf is a set of primitives coming from a mix of blocks N-1 and N; it could be simplified to:

(block numbers are 1-indexed whereas leaf indexes are 0-indexed)
BEEFY payload for block N contains MMR root built from leaves indexed L in 0..=N-1, with each Leaf index L containing primitives coming exclusively from block number L.
At block N, the latest leaf L=N-1 contains primitives values pertaining to block N-1 (instead of some from N-1 and some from N).


Marking this as "Some day maybe" since existing clients already successfully handle current behavior.

@acatangiu acatangiu added U4-some_day_maybe Issue might be worth doing eventually. and removed U2-some_time_soon Issue is worth doing soon. labels Jul 7, 2022
@seunlanlege
Copy link
Contributor

actually have encountered this issue and i created this pr as a fix: #10907

@acatangiu
Copy link
Contributor Author

actually have encountered this issue and i created this pr as a fix: #10907

this is actually something different - by building leaf on_finalize() you actually get:

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
-   extra_data: <para_heads_of_<N-1>>,
+   extra_data: <para_heads_of_<N>>,
    next_auth_set: <next_auth_set_of<N>>,
}

and at this point you also run into trouble with 1-block deep forks writing different leaf contents to offchain db for same (parent, pos) key.

The offchain-db thing can be fixed by also doing #11799.


@seunlanlege Is the mixed layout of MmrLeaf above really better?

Isn't light-client code simpler if it simply references leaf index N when interested in extra_data of block N?
(i.e. for N=5: leaf index 5 is the leaf added by block 6 containing extra_data of block 5)

@seunlanlege
Copy link
Contributor

seunlanlege commented Sep 22, 2022

Isn't light-client code simpler if it simply references leaf index N when interested in extra_data of block N?

So you're correct that this makes sense, but only from a leaf index perspective.

But light clients won't be tracking leaf indexes, they'll be tracking block heights.

And it's a bit wonky from that perspective that a leaf emitted a block N contains data about N-1 especially when there's no clear advantage to having it this way.

Whereas it's easier to reason that a leaf emitted at block N contains information about block N.

Also the leaves in this N-1 will miss the authority set change. So you'll need a leaf N+1 where N is the authority set change height to know exactly the beginning of a new epoch. Which is just cognitive overhead.

@acatangiu
Copy link
Contributor Author

Aha, that makes sense. So to summarize, you're saying for a light client, the best experience to interact with pallet-mmr is if it exposed leafs with following format:

// contents for leaf added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_block<N-1>>,
    ),
    extra_data: <para_heads_at_block<N>>,
    next_auth_set: <next_auth_set_at_block<N>>,
}

right?

@seunlanlege
Copy link
Contributor

Aha, that makes sense. So to summarize, you're saying for a light client, the best experience to interact with pallet-mmr is if it exposed leafs with following format:

yes correct.

@vgeddes
Copy link
Contributor

vgeddes commented Sep 22, 2022

I'm happy with this change too. It makes sense for me and would allow us to improve some of the code in our off-chain relayers. As the relationship between block-height, leaf-index, and the desired leaf.paras-root was previously a bit confusing.

+1

@acatangiu
Copy link
Contributor Author

Aha, that makes sense. So to summarize, you're saying for a light client, the best experience to interact with pallet-mmr is if it exposed leafs with following format:

// contents for leaf added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_block<N-1>>,
    ),
    extra_data: <para_heads_at_block<N>>,
    next_auth_set: <next_auth_set_at_block<N>>,
}

I'm sorry @vgeddes @seunlanlege but this doesn't seem to be possible 😢 see #12446 (comment)

Closing for now, please reopen if you have any other ideas.

@acatangiu
Copy link
Contributor Author

We decided we want to do at least this (since #12446 isn't possible):

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
-   next_auth_set: <next_auth_set_of<N>>,
+   next_auth_set: <next_auth_set_of<N-1>>,
}

which could be as easy as:
#11797 (comment)

@acatangiu acatangiu reopened this Dec 6, 2022
@acatangiu acatangiu added U1-asap No need to stop dead in your tracks, however issue should be addressed as soon as possible. and removed U2-some_time_soon Issue is worth doing soon. labels Dec 6, 2022
@acatangiu acatangiu assigned serban300 and Lederstrumpf and unassigned serban300 Dec 6, 2022
@Lederstrumpf
Copy link
Contributor

We decided we want to do at least this (since #12446 isn't possible):

// contents for leaf index <N-1> added by block <N>
MmrLeaf {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
-   next_auth_set: <next_auth_set_of<N>>,
+   next_auth_set: <next_auth_set_of<N-1>>,
}

which could be as easy as: #11797 (comment)

I can confirm this works:

  1. change (only moving pallet_mmr ahead of pallet_session in runtime enum) in https://github.com/Lederstrumpf/polkadot/tree/beefy-mmr-leaf-off-by-one, with log bumps for debugging in https://github.com/Lederstrumpf/substrate/tree/beefy-mmr-leaf-off-by-one,
  2. logs showing that it's taking next_auth_set_of<N-1>: https://hackmd.io/Ve-FPPAJSdeGNc3rc-FmrA

So this would make the leaf content consistent wrt. block numbering.

However, looking at Adrian's PR #12446 again, the reason for closing it was that offchain worker is not guaranteed to run on every block, but since #12753 moved the offchain storage handling to client-side gadget, there's nothing blocking doing next_auth_set_of<N> as implemented in #12446, if I'm not mistaken?

Otherwise, I'll open up PR for next_auth_set_of<N-1>.

@acatangiu
Copy link
Contributor Author

but since #12753 moved the offchain storage handling to client-side gadget, there's nothing blocking doing next_auth_set_of as implemented in #12446, if I'm not mistaken?

it could be done at the expense of extra runtime storage - keep non-canon full data in runtime storage instead of offchain storage and "back it up" to offchain on client-side finality, but then you still have problems when finality lags and aggressive pruning is configured; and obviously increased storage costs for the chain.

I think the wise path is to take the slightly increased code complexity on light-client side and use <N-1>, and thus avoid all the extra runtime storage required to work directly on <N>.

Otherwise, I'll open up PR for next_auth_set_of<N-1>.

I think this is the right call.

@vgeddes @seunlanlege just to confirm there is no difference in light-client "runtime costs" between:

MmrLeaf @ block N {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N-1>>,
    next_auth_set: <next_auth_set_of<N-1>>,
}

and

MmrLeaf @ block N {
    version: <leaf-data-format-version>,
    (
        parent_num: <N-1>,
        parent_hash: <hash_of_<N-1>>,
    ),
    extra_data: <para_heads_of_<N>>,        <---- diff here
    next_auth_set: <next_auth_set_of<N>>,   <---- diff here
}

@seunlanlege
Copy link
Contributor

seunlanlege commented Jan 12, 2023

@vgeddes @seunlanlege just to confirm there is no difference in light-client "runtime costs" between:

Yes correct. The light client only needs to be aware of a single header, rather than 2.

@vgeddes
Copy link
Contributor

vgeddes commented Jan 12, 2023

Should not be a problem for our light client, I think.

@acatangiu
Copy link
Contributor Author

Fixed in paritytech/polkadot#6577

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
I3-bug The node fails to follow expected behavior. U1-asap No need to stop dead in your tracks, however issue should be addressed as soon as possible.
Projects
No open projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants