Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FAB-18521] Replicate block metadata with block while OSN catching up #2748

Merged
merged 1 commit into from
Jul 15, 2021

Conversation

C0rWin
Copy link
Contributor

@C0rWin C0rWin commented Jul 10, 2021

Type of change

  • Bug fix

Description

While OSN catches up replicating block from the up-to-date replica the consenters' metadata information omitted, i.e.

c.support.WriteBlock(block, nil)

where nil substitutes for block's metadata.

In this commit, the consenters' metadata extracted from the replicated block and being written with the block.

Signed-off-by: Artem Barger artem@bargr.net

@C0rWin C0rWin requested a review from yacovm July 10, 2021 23:58
@C0rWin C0rWin requested a review from a team as a code owner July 10, 2021 23:58
@C0rWin C0rWin force-pushed the commitMetadata branch 2 times, most recently from c1093b2 to aaa7fdf Compare July 11, 2021 00:34
@yacovm
Copy link
Contributor

yacovm commented Jul 11, 2021

I think this bug deserves a JIRA so we can track backporting, no?

Also how did we never encountered this before? How do orderers find the consenter mappings without that metadata if they catch up?

Perhaps consider making the integration test reproduce the problem so we can ensure we're not missing anything

@C0rWin
Copy link
Contributor Author

C0rWin commented Jul 11, 2021

I think this bug deserves a JIRA so we can track backporting, no?

No problem, I will open JIRA with a more detailed explanation of the issue ;)

Also, how did we never encountered this before? How do orderers find the consenter mappings without that metadata if they catch up?

I am asking myself the same question.

Perhaps consider making the integration test reproduce the problem so we can ensure we're not missing anything

Sure, I will make changes to IT to stress out the issue.

@yacovm
Copy link
Contributor

yacovm commented Jul 11, 2021

So when I remove your fix from the production code and re-run the test, it your metadata equality assertion fails, but... the orderer1 still manages to reconstruct the node identities correctly and as a result, to locate the leader and the rest of the nodes.

I guess, this is because we never do a configuration change in the test, therefore the "empty" metadata is equivalent to the non empty metadata in the case of membership construction.

What I think we should strive for, is a test that if your production code change is removed, then the orderer never manages to recover.

@C0rWin
Copy link
Contributor Author

C0rWin commented Jul 11, 2021

So when I remove your fix from the production code and re-run the test, it your metadata equality assertion fails, but... the orderer1 still manages to reconstruct the node identities correctly and as a result, to locate the leader and the rest of the nodes.

I guess, this is because we never do a configuration change in the test, therefore the "empty" metadata is equivalent to the non empty metadata in the case of membership construction.

What I think we should strive for, is a test that if your production code change is removed, then the orderer never manages to recover.

Yes, I realized after pushed the update, that as long as there is no reconfiguration and metadata is absent during the bootstrap OSN will take info from config, this is why I think we never saw this issue manifested before (cause case is quite rare).

PS. Will add reconfiguration.

@C0rWin C0rWin force-pushed the commitMetadata branch 2 times, most recently from 33ab651 to 661e166 Compare July 14, 2021 17:55
While OSN catches up replicating block from the up-to-date replica the
metadata information omitted, i.e.

```
c.support.WriteBlock(block, nil)
```

where `nil` substitutes for block's metadata.

In this commit, the consenters metadata extracted from the replicated
block and being written with the block.

Signed-off-by: Artem Barger <artem@bargr.net>
Copy link
Contributor

@yacovm yacovm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, the test fails now without the fix

@yacovm yacovm merged commit 44ab2bf into hyperledger:main Jul 15, 2021
@yacovm
Copy link
Contributor

yacovm commented Jul 15, 2021

@Mergifyio backport release-2.3

@yacovm
Copy link
Contributor

yacovm commented Jul 15, 2021

@Mergifyio backport release-2.2

mergify bot pushed a commit that referenced this pull request Jul 15, 2021
…#2748)

While OSN catches up replicating block from the up-to-date replica the
metadata information omitted, i.e.

```
c.support.WriteBlock(block, nil)
```

where `nil` substitutes for block's metadata.

In this commit, the consenters metadata extracted from the replicated
block and being written with the block.

Signed-off-by: Artem Barger <artem@bargr.net>
(cherry picked from commit 44ab2bf)

# Conflicts:
#	integration/raft/cft_test.go
@mergify
Copy link

mergify bot commented Jul 15, 2021

Command backport release-2.3: success

Backports have been created

mergify bot pushed a commit that referenced this pull request Jul 15, 2021
…#2748)

While OSN catches up replicating block from the up-to-date replica the
metadata information omitted, i.e.

```
c.support.WriteBlock(block, nil)
```

where `nil` substitutes for block's metadata.

In this commit, the consenters metadata extracted from the replicated
block and being written with the block.

Signed-off-by: Artem Barger <artem@bargr.net>
(cherry picked from commit 44ab2bf)

# Conflicts:
#	integration/raft/cft_test.go
@mergify
Copy link

mergify bot commented Jul 15, 2021

Command backport release-2.2: success

Backports have been created

@denyeart denyeart changed the title Replicate block metadata with block while OSN catching up [FAB-18521] Replicate block metadata with block while OSN catching up Jul 15, 2021
@C0rWin C0rWin deleted the commitMetadata branch July 15, 2021 21:49
@denyeart denyeart mentioned this pull request Jul 16, 2021
@denyeart
Copy link
Contributor

Looks like this merge is causing integration test failures, see #2759 CI.

@C0rWin
Copy link
Contributor Author

C0rWin commented Jul 16, 2021

Looks like this merge is causing integration test failures, see #2759 CI.

@denyeart #2761 should fix that

C0rWin added a commit to C0rWin/fabric that referenced this pull request Jul 16, 2021
PR hyperledger#2748, introduced new IT to ensure the fix. However, there is some
flakiness manifested with this IT, caused by sending the remove
consenter transaction to the "to be removed" node. Removing
the consenter is a config transaction where codes after sending it
ensure a new block with the config update successfully committed,
which is the root cause for the flakiness. Once OSN is removed from
the channel, it no longer can server deliver requests for clients
trying to fetch from it.

This commit, fixes it by sending remove OSN config updated transaction
to a different node instead.

Signed-off-by: Artem Barger <artem@bargr.net>
yacovm pushed a commit that referenced this pull request Jul 17, 2021
PR #2748, introduced new IT to ensure the fix. However, there is some
flakiness manifested with this IT, caused by sending the remove
consenter transaction to the "to be removed" node. Removing
the consenter is a config transaction where codes after sending it
ensure a new block with the config update successfully committed,
which is the root cause for the flakiness. Once OSN is removed from
the channel, it no longer can server deliver requests for clients
trying to fetch from it.

This commit, fixes it by sending remove OSN config updated transaction
to a different node instead.

Signed-off-by: Artem Barger <artem@bargr.net>
mergify bot pushed a commit that referenced this pull request Sep 8, 2021
PR #2748, introduced new IT to ensure the fix. However, there is some
flakiness manifested with this IT, caused by sending the remove
consenter transaction to the "to be removed" node. Removing
the consenter is a config transaction where codes after sending it
ensure a new block with the config update successfully committed,
which is the root cause for the flakiness. Once OSN is removed from
the channel, it no longer can server deliver requests for clients
trying to fetch from it.

This commit, fixes it by sending remove OSN config updated transaction
to a different node instead.

Signed-off-by: Artem Barger <artem@bargr.net>
(cherry picked from commit ffe7d36)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants