-
Notifications
You must be signed in to change notification settings - Fork 8.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FAB-11334] Adds a new 'peer node unjoin' feature #2732
Conversation
745c695
to
1353dee
Compare
Can you explain what is the fundamental reason we are shutting down the peer for that? Can't this be done while the peer is running and servicing other channels? |
Hi @yacovm, thanks for the inquiry and this is a good question. There is no fundamental reason why unjoin can not operate while the peer is running. @manish-sethi articulated the pros/cons of running the unjoin in both online and offline modes - after review of approach alternatives we elected to run the unjoin in offline mode for simplicity and as a compromise to the current operational practice of pause+reset. Everyone is in agreement that there is still a need for the online unjoin channel operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jkneubuh for your first PR. Left you a few comments.
return errors.Wrap(err, "as another peer node command is executing,"+ | ||
" wait for that command to complete its execution or terminate it before retrying") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and other places below: errors.WithMessage
fits better here. Wrap
function creates a new stack trace that we use if the underlying error is thrown by a third party library.
if err := idStore.updateLedgerStatus(ledgerID, status); err != nil { | ||
return err | ||
} | ||
|
||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := idStore.updateLedgerStatus(ledgerID, status); err != nil { | |
return err | |
} | |
return nil | |
return idStore.updateLedgerStatus(ledgerID, status) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... I mildly disagree with this pattern / convention. It's OK but I actually prefer the explicit assignment and test of the error routine for absolute consistency. It seems like many (all?) golang routines consist of a serial flow of do-something+check-error+do-something+check-error+... invocations. When then final invocation in a series of calls is used as the error for the outer function, the reader needs to stop and think about the return type of the last call in the chain, rather than recognize the pattern as a a cookie cutter linear flow. For instance, in the two examples above, the explicit 'return nil' case is 100% clear that the routine is returning an OK result. In the latter, looking at the bottom of the routine it's not 100% clear that the method is returning an error, the new status, or the previous status of the ledger. Call me simple but I prefer the explicit style - the intent is clearer, and leaves opportunities for post-call logging, future method calls without restructuring, and ... ease of use when attached to a debugger. IMHO it's worth the extra typing. :)
// Subsequent unjoins will not throw an error | ||
require.NoError(t, UnjoinChannel(conf, ledgerID)) | ||
require.NoError(t, UnjoinChannel(conf, ledgerID)) | ||
require.NoError(t, UnjoinChannel(conf, ledgerID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
This is not a desired behavior. You got this here because you missed deleting the entry from the idStore. Otherwise, it would have rightly returned this error.
-
Another aspect of the behavior would be that it should be able to create the ledger with the same ID again. But this test perhaps fits better in kvledger/tests folder - where we maintain the tests for testing the behavior under various scenarios (including when the ledger contains private data) unlike in this package where the tests are more for very basic code coverage. Keeping that in mind, I would suggest that the two tests that you may want to retain here in package would be the first and last only as the rest of the tests would be covered in kvledger/tests package.
if err := removeLedgerData(config, ledgerID); err != nil { | ||
return errors.Wrapf(err, "deleting ledger [%v]", ledgerID) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- After having the data deleted successfully, we need to delete the entry of the ledger from the idStore; leaving the peer in the state that it can join a channel with same name again.
- We would need make change in the boot process so that we continue to delete the partially deleted ledgers, perhaps because of a crash during execution of this commend - as we do for partially constructed here. But this change can be done in a separate PR as it would need it's own tests to make sure the crash consistency of the deletion.
require.NoError(t, UnjoinChannel(conf, ledgerID)) | ||
} | ||
|
||
func TestUpdateLedgerStatus(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it does not harm having this as a separate test but in any case, the assertions made in this tests should be made in the test TestUnjoinChannel
above and as such this test can be removed.
require.Error(t, UnjoinChannel(conf, ledgerID), | ||
"as another peer node command is executing, wait for that command to complete its execution or terminate it before retrying") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For matching the errors, either we use EqualError
or Contains
funciton.
Understood, thanks. So the reason I asked that, is, that if the peer boots after being operated on and having the channel being marked as under deletion, it will obviously not initialize the in-memory structures and goroutines of that channel, because it will first delete the ledger and then resume bootstrapping. In gossip, I designed the ability to make the peer leave a channel (many years ago, in 2017) and it was under the assumption the channel leave API will be triggered when the peer is online, which will give the peer time to broadcast a message that makes other peers ignore it is in the channel. While it is possible to still artificially call JoinChan after bootstrapping, on gossip, just for the channel, and then immediately call LeaveChannel, to make that message broadcasted, it is not effective as the membership view will most likely be empty when the peer boots up. So, maybe this gap should be documented at least. |
4458e58
to
3562edd
Compare
The UnjoinChannel routine will mark a ledger as pending delete, removing all channel specific data from the peer. The routine must be invoked while the peer is down. Peer operators may issue the deletion by running the companion 'peer node unjoin' command from the console. Signed-off-by: Josh Kneubuhl <jkneubuh@us.ibm.com>
3562edd
to
648175a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple nit comments. Can be fixed in a separate PR or with next PR.
logger.Infow("channel has been successfully unjoined", "ledgerID", ledgerID) | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike error strings, log statements would typically start with a full sentence.
// UnjoinChannel removes the data for a ledger and sets the status to UNDER_DELETION. This function is to be | ||
// invoked while the peer is shut down. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stale function comment as this function now clears the UNDER_DELETION status.
The UnjoinChannel routine will mark a ledger as pending delete, removing all
channel specific data from the peer. The routine must be invoked while the
peer is down. Peer operators may issue the deletion by running the companion
'peer node unjoin' command from the console.
Type of change
Description
As a peer admin, I need a way to pause/resume a channel
Unjoining a is a helpful routine for operators managing channel lifecycles. In some cases, a network participant may opt out of a channel, requesting a complete deletion of all channel specific data from the ledger. While an operator may issue a node reset after pausing a channel, this is not viable in cases where the re-initialization requires the synchronization of a large (millions / billions) number of transactions.
Related issues
FAB-16035
FAB-4481
FAB-17787
FAB-17801