Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAB-18244 single node catches up with snapshot #1964

Merged
merged 1 commit into from
Oct 5, 2020

Conversation

guoger
Copy link
Member

@guoger guoger commented Oct 3, 2020

Support.WriteBlock commits block to ledger asynchronously and can have
up to one block in-flight. And there's possibility a node crashes before
such block is persisted successfully. Normally when node restarts, Raft
loads entries from WAL and attempts to re-apply them. However, when a
snapshot is taken at this block, only entries after (if any) the
snapshot are loaded, and we end up hanging here forever waiting for
missing blocks to be pulled from nowhere in single node situation.

A straightforward solution would be to peek into ledger tip first, and
decide whether to load some "old" entries from WAL, instead of blindly
load data after latest snapshot. Although it's trickier than it sounds:

  • today, we don't strictly respect the contract between Raft and state
    machine, where applied data should not be lossy and it's safe to prune
    data in WAL after snapshots. For example, in extreme case, if we lose
    the entire ledger, we should not expect it to be recoverable from WAL

  • etcd/raft persistence library does not provide friendly interfaces
    to control what data to load in fine-grained manner. For example,
    snap.Load() simply looks for latest snapshot available, and loads
    entries after that. If we'd like to, for example, load older data prior
    to that snapshot, we'll need to come up with our own utilities

This commit aims to provide a quick fix for bug described in FAB-18244,
leveraging the fact that we can have only one async block in-flight, and
leave the "correct" solution to future work.

Signed-off-by: Jay Guo guojiannan1101@gmail.com

Support.WriteBlock commits block to ledger asynchronously and can have
up to one block in-flight. And there's possibility a node crashes before
such block is persisted successfully. Normally when node restarts, Raft
loads entries from WAL and attempts to re-apply them. However, when a
snapshot is taken at this block, only entries after (if any) the
snapshot are loaded, and we end up hanging here forever waiting for
missing blocks to be pulled from nowhere in single node situation.

A straightforward solution would be to peek into ledger tip first, and
decide whether to load some "old" entries from WAL, instead of blindly
load data after latest snapshot. Although it's trickier than it sounds:

- today, we don't strictly respect the contract between Raft and state
machine, where applied data should not be lossy and it's safe to prune
data in WAL after snapshots. For example, in extreme case, if we lose
the entire ledger, we should not expect it to be recoverable from WAL

- etcd/raft persistence library does not provide friendly interfaces
to control what data to load in fine-grained manner. For example,
snap.Load() simply looks for latest snapshot available, and loads
entries after that. If we'd like to, for example, load older data prior
to that snapshot, we'll need to come up with our own utilities

This commit aims to provide a quick fix for bug described in FAB-18244,
leveraging the fact that we can have only one async block in-flight, and
leave the "correct" solution to future work.

Signed-off-by: Jay Guo <guojiannan1101@gmail.com>
@guoger guoger requested a review from a team as a code owner October 3, 2020 06:38
Copy link
Contributor

@jyellick jyellick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this solution isn't ideal, it's definitely simple, and fixes a real bug, so merging. I'm certainly not eager to re-implement any of etcd's WAL parsing/loading, especially if the issue is fixed. I'll continue to think on whether there's a nicer solution, and I'd appreciate it if you would was well.

@@ -900,6 +900,11 @@ func (c *Chain) catchUp(snap *raftpb.Snapshot) error {
if c.lastBlock.Header.Number >= b.Header.Number {
c.logger.Warnf("Snapshot is at block [%d], local block number is %d, no sync needed", b.Header.Number, c.lastBlock.Header.Number)
return nil
} else if b.Header.Number == c.lastBlock.Header.Number+1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely don't love this, though it is fairly obviously correct, it injects knowledge of the sync properties of the WriteBlock call into the Raft implementation.

@jyellick jyellick merged commit e264d1b into hyperledger:master Oct 5, 2020
guoger added a commit to guoger/fabric that referenced this pull request Oct 19, 2020
Support.WriteBlock commits block to ledger asynchronously and can have
up to one block in-flight. And there's possibility a node crashes before
such block is persisted successfully. Normally when node restarts, Raft
loads entries from WAL and attempts to re-apply them. However, when a
snapshot is taken at this block, only entries after (if any) the
snapshot are loaded, and we end up hanging here forever waiting for
missing blocks to be pulled from nowhere in single node situation.

A straightforward solution would be to peek into ledger tip first, and
decide whether to load some "old" entries from WAL, instead of blindly
load data after latest snapshot. Although it's trickier than it sounds:

- today, we don't strictly respect the contract between Raft and state
machine, where applied data should not be lossy and it's safe to prune
data in WAL after snapshots. For example, in extreme case, if we lose
the entire ledger, we should not expect it to be recoverable from WAL

- etcd/raft persistence library does not provide friendly interfaces
to control what data to load in fine-grained manner. For example,
snap.Load() simply looks for latest snapshot available, and loads
entries after that. If we'd like to, for example, load older data prior
to that snapshot, we'll need to come up with our own utilities

This commit aims to provide a quick fix for bug described in FAB-18244,
leveraging the fact that we can have only one async block in-flight, and
leave the "correct" solution to future work.

Signed-off-by: Jay Guo <guojiannan1101@gmail.com>
C0rWin pushed a commit that referenced this pull request Oct 19, 2020
Support.WriteBlock commits block to ledger asynchronously and can have
up to one block in-flight. And there's possibility a node crashes before
such block is persisted successfully. Normally when node restarts, Raft
loads entries from WAL and attempts to re-apply them. However, when a
snapshot is taken at this block, only entries after (if any) the
snapshot are loaded, and we end up hanging here forever waiting for
missing blocks to be pulled from nowhere in single node situation.

A straightforward solution would be to peek into ledger tip first, and
decide whether to load some "old" entries from WAL, instead of blindly
load data after latest snapshot. Although it's trickier than it sounds:

- today, we don't strictly respect the contract between Raft and state
machine, where applied data should not be lossy and it's safe to prune
data in WAL after snapshots. For example, in extreme case, if we lose
the entire ledger, we should not expect it to be recoverable from WAL

- etcd/raft persistence library does not provide friendly interfaces
to control what data to load in fine-grained manner. For example,
snap.Load() simply looks for latest snapshot available, and loads
entries after that. If we'd like to, for example, load older data prior
to that snapshot, we'll need to come up with our own utilities

This commit aims to provide a quick fix for bug described in FAB-18244,
leveraging the fact that we can have only one async block in-flight, and
leave the "correct" solution to future work.

Signed-off-by: Jay Guo <guojiannan1101@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants