-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FAB-18521] Replicate block metadata with block while OSN catching up #2748
Conversation
c1093b2
to
aaa7fdf
Compare
I think this bug deserves a JIRA so we can track backporting, no? Also how did we never encountered this before? How do orderers find the consenter mappings without that metadata if they catch up? Perhaps consider making the integration test reproduce the problem so we can ensure we're not missing anything |
No problem, I will open JIRA with a more detailed explanation of the issue ;)
I am asking myself the same question.
Sure, I will make changes to IT to stress out the issue. |
So when I remove your fix from the production code and re-run the test, it your metadata equality assertion fails, but... the orderer1 still manages to reconstruct the node identities correctly and as a result, to locate the leader and the rest of the nodes. I guess, this is because we never do a configuration change in the test, therefore the "empty" metadata is equivalent to the non empty metadata in the case of membership construction. What I think we should strive for, is a test that if your production code change is removed, then the orderer never manages to recover. |
Yes, I realized after pushed the update, that as long as there is no reconfiguration and metadata is absent during the bootstrap OSN will take info from config, this is why I think we never saw this issue manifested before (cause case is quite rare). PS. Will add reconfiguration. |
33ab651
to
661e166
Compare
While OSN catches up replicating block from the up-to-date replica the metadata information omitted, i.e. ``` c.support.WriteBlock(block, nil) ``` where `nil` substitutes for block's metadata. In this commit, the consenters metadata extracted from the replicated block and being written with the block. Signed-off-by: Artem Barger <artem@bargr.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, the test fails now without the fix
@Mergifyio backport release-2.3 |
@Mergifyio backport release-2.2 |
…#2748) While OSN catches up replicating block from the up-to-date replica the metadata information omitted, i.e. ``` c.support.WriteBlock(block, nil) ``` where `nil` substitutes for block's metadata. In this commit, the consenters metadata extracted from the replicated block and being written with the block. Signed-off-by: Artem Barger <artem@bargr.net> (cherry picked from commit 44ab2bf) # Conflicts: # integration/raft/cft_test.go
Command
|
…#2748) While OSN catches up replicating block from the up-to-date replica the metadata information omitted, i.e. ``` c.support.WriteBlock(block, nil) ``` where `nil` substitutes for block's metadata. In this commit, the consenters metadata extracted from the replicated block and being written with the block. Signed-off-by: Artem Barger <artem@bargr.net> (cherry picked from commit 44ab2bf) # Conflicts: # integration/raft/cft_test.go
Command
|
Looks like this merge is causing integration test failures, see #2759 CI. |
PR hyperledger#2748, introduced new IT to ensure the fix. However, there is some flakiness manifested with this IT, caused by sending the remove consenter transaction to the "to be removed" node. Removing the consenter is a config transaction where codes after sending it ensure a new block with the config update successfully committed, which is the root cause for the flakiness. Once OSN is removed from the channel, it no longer can server deliver requests for clients trying to fetch from it. This commit, fixes it by sending remove OSN config updated transaction to a different node instead. Signed-off-by: Artem Barger <artem@bargr.net>
PR #2748, introduced new IT to ensure the fix. However, there is some flakiness manifested with this IT, caused by sending the remove consenter transaction to the "to be removed" node. Removing the consenter is a config transaction where codes after sending it ensure a new block with the config update successfully committed, which is the root cause for the flakiness. Once OSN is removed from the channel, it no longer can server deliver requests for clients trying to fetch from it. This commit, fixes it by sending remove OSN config updated transaction to a different node instead. Signed-off-by: Artem Barger <artem@bargr.net>
PR #2748, introduced new IT to ensure the fix. However, there is some flakiness manifested with this IT, caused by sending the remove consenter transaction to the "to be removed" node. Removing the consenter is a config transaction where codes after sending it ensure a new block with the config update successfully committed, which is the root cause for the flakiness. Once OSN is removed from the channel, it no longer can server deliver requests for clients trying to fetch from it. This commit, fixes it by sending remove OSN config updated transaction to a different node instead. Signed-off-by: Artem Barger <artem@bargr.net> (cherry picked from commit ffe7d36)
Type of change
Description
While OSN catches up replicating block from the up-to-date replica the consenters' metadata information omitted, i.e.
where
nil
substitutes for block's metadata.In this commit, the consenters' metadata extracted from the replicated block and being written with the block.
Signed-off-by: Artem Barger artem@bargr.net