[FAB-11334] Adds a new 'peer node unjoin' feature #2732

jkneubuh · 2021-07-01T13:37:46Z

The UnjoinChannel routine will mark a ledger as pending delete, removing all
channel specific data from the peer. The routine must be invoked while the
peer is down. Peer operators may issue the deletion by running the companion
'peer node unjoin' command from the console.

Type of change

New feature

Description

As a peer admin, I need a way to pause/resume a channel

Unjoining a is a helpful routine for operators managing channel lifecycles. In some cases, a network participant may opt out of a channel, requesting a complete deletion of all channel specific data from the ledger. While an operator may issue a node reset after pausing a channel, this is not viable in cases where the re-initialization requires the synchronization of a large (millions / billions) number of transactions.

Related issues

FAB-16035
FAB-4481
FAB-17787
FAB-17801

yacovm · 2021-07-01T18:16:45Z

The routine must be invoked while the peer is down.

Can you explain what is the fundamental reason we are shutting down the peer for that? Can't this be done while the peer is running and servicing other channels?

jkneubuh · 2021-07-01T19:33:03Z

Hi @yacovm, thanks for the inquiry and this is a good question. There is no fundamental reason why unjoin can not operate while the peer is running. @manish-sethi articulated the pros/cons of running the unjoin in both online and offline modes - after review of approach alternatives we elected to run the unjoin in offline mode for simplicity and as a compromise to the current operational practice of pause+reset. Everyone is in agreement that there is still a need for the online unjoin channel operation.

manish-sethi

Thanks @jkneubuh for your first PR. Left you a few comments.

manish-sethi · 2021-07-01T19:54:36Z

core/ledger/kvledger/unjoin_channel.go

+		return errors.Wrap(err, "as another peer node command is executing,"+
+			" wait for that command to complete its execution or terminate it before retrying")


Here and other places below: errors.WithMessage fits better here. Wrap function creates a new stack trace that we use if the underlying error is thrown by a third party library.

manish-sethi · 2021-07-01T19:55:09Z

core/ledger/kvledger/unjoin_channel.go

+	if err := idStore.updateLedgerStatus(ledgerID, status); err != nil {
+		return err
+	}
+
+	return nil


Suggested change

if err := idStore.updateLedgerStatus(ledgerID, status); err != nil {

return err

}

return nil

return idStore.updateLedgerStatus(ledgerID, status)

Hm... I mildly disagree with this pattern / convention. It's OK but I actually prefer the explicit assignment and test of the error routine for absolute consistency. It seems like many (all?) golang routines consist of a serial flow of do-something+check-error+do-something+check-error+... invocations. When then final invocation in a series of calls is used as the error for the outer function, the reader needs to stop and think about the return type of the last call in the chain, rather than recognize the pattern as a a cookie cutter linear flow. For instance, in the two examples above, the explicit 'return nil' case is 100% clear that the routine is returning an OK result. In the latter, looking at the bottom of the routine it's not 100% clear that the method is returning an error, the new status, or the previous status of the ledger. Call me simple but I prefer the explicit style - the intent is clearer, and leaves opportunities for post-call logging, future method calls without restructuring, and ... ease of use when attached to a debugger. IMHO it's worth the extra typing. :)

manish-sethi · 2021-07-01T20:04:45Z

core/ledger/kvledger/unjoin_channel_test.go

+	// Subsequent unjoins will not throw an error
+	require.NoError(t, UnjoinChannel(conf, ledgerID))
+	require.NoError(t, UnjoinChannel(conf, ledgerID))
+	require.NoError(t, UnjoinChannel(conf, ledgerID))


This is not a desired behavior. You got this here because you missed deleting the entry from the idStore. Otherwise, it would have rightly returned this error.

Another aspect of the behavior would be that it should be able to create the ledger with the same ID again. But this test perhaps fits better in kvledger/tests folder - where we maintain the tests for testing the behavior under various scenarios (including when the ledger contains private data) unlike in this package where the tests are more for very basic code coverage. Keeping that in mind, I would suggest that the two tests that you may want to retain here in package would be the first and last only as the rest of the tests would be covered in kvledger/tests package.

manish-sethi · 2021-07-01T20:07:28Z

core/ledger/kvledger/unjoin_channel.go

+	if err := removeLedgerData(config, ledgerID); err != nil {
+		return errors.Wrapf(err, "deleting ledger [%v]", ledgerID)
+	}
+


After having the data deleted successfully, we need to delete the entry of the ledger from the idStore; leaving the peer in the state that it can join a channel with same name again.

We would need make change in the boot process so that we continue to delete the partially deleted ledgers, perhaps because of a crash during execution of this commend - as we do for partially constructed here. But this change can be done in a separate PR as it would need it's own tests to make sure the crash consistency of the deletion.

manish-sethi · 2021-07-01T20:28:22Z

core/ledger/kvledger/unjoin_channel_test.go

+	require.NoError(t, UnjoinChannel(conf, ledgerID))
+}
+
+func TestUpdateLedgerStatus(t *testing.T) {


Though it does not harm having this as a separate test but in any case, the assertions made in this tests should be made in the test TestUnjoinChannel above and as such this test can be removed.

manish-sethi · 2021-07-01T20:30:01Z

core/ledger/kvledger/unjoin_channel_test.go

+	require.Error(t, UnjoinChannel(conf, ledgerID),
+		"as another peer node command is executing, wait for that command to complete its execution or terminate it before retrying")


For matching the errors, either we use EqualError or Contains funciton.

yacovm · 2021-07-01T20:55:43Z

Hi @yacovm, thanks for the inquiry and this is a good question. There is no fundamental reason why unjoin can not operate while the peer is running. @manish-sethi articulated the pros/cons of running the unjoin in both online and offline modes - after review of approach alternatives we elected to run the unjoin in offline mode for simplicity and as a compromise to the current operational practice of pause+reset. Everyone is in agreement that there is still a need for the online unjoin channel operation.

Understood, thanks.

So the reason I asked that, is, that if the peer boots after being operated on and having the channel being marked as under deletion, it will obviously not initialize the in-memory structures and goroutines of that channel, because it will first delete the ledger and then resume bootstrapping.
However, other peers will then still think that peer is in the channel, and will try to replicate blocks from it, and send blocks to it, and the peer will still appear in discovery queries for a while. While it won't really harm any operations, it will pollute the logs, confuse clients (incorrect discovery results) and will surely raise questions from operators and administrators.

In gossip, I designed the ability to make the peer leave a channel (many years ago, in 2017) and it was under the assumption the channel leave API will be triggered when the peer is online, which will give the peer time to broadcast a message that makes other peers ignore it is in the channel.

While it is possible to still artificially call JoinChan after bootstrapping, on gossip, just for the channel, and then immediately call LeaveChannel, to make that message broadcasted, it is not effective as the membership view will most likely be empty when the peer boots up.

So, maybe this gap should be documented at least.

The UnjoinChannel routine will mark a ledger as pending delete, removing all channel specific data from the peer. The routine must be invoked while the peer is down. Peer operators may issue the deletion by running the companion 'peer node unjoin' command from the console. Signed-off-by: Josh Kneubuhl <jkneubuh@us.ibm.com>

manish-sethi

A couple nit comments. Can be fixed in a separate PR or with next PR.

manish-sethi · 2021-07-07T19:55:18Z

core/ledger/kvledger/unjoin_channel.go

+	logger.Infow("channel has been successfully unjoined", "ledgerID", ledgerID)
+	return nil


Unlike error strings, log statements would typically start with a full sentence.

manish-sethi · 2021-07-07T19:56:02Z

core/ledger/kvledger/unjoin_channel.go

+// UnjoinChannel removes the data for a ledger and sets the status to UNDER_DELETION.  This function is to be
+// invoked while the peer is shut down.


Stale function comment as this function now clears the UNDER_DELETION status.

jkneubuh requested a review from a team as a code owner July 1, 2021 13:37

jkneubuh force-pushed the feature/unjoin-channel branch from 745c695 to 1353dee Compare July 1, 2021 17:06

manish-sethi reviewed Jul 1, 2021

View reviewed changes

jkneubuh force-pushed the feature/unjoin-channel branch 2 times, most recently from 4458e58 to 3562edd Compare July 6, 2021 18:34

jkneubuh force-pushed the feature/unjoin-channel branch from 3562edd to 648175a Compare July 7, 2021 18:58

manish-sethi approved these changes Jul 7, 2021

View reviewed changes

manish-sethi merged commit 9736485 into hyperledger:main Jul 7, 2021

jkneubuh mentioned this pull request Jul 27, 2021

[FAB-11334] Adds a viper Cmd to unjoin a peer from a channel #2793

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FAB-11334] Adds a new 'peer node unjoin' feature #2732

[FAB-11334] Adds a new 'peer node unjoin' feature #2732

jkneubuh commented Jul 1, 2021

yacovm commented Jul 1, 2021

jkneubuh commented Jul 1, 2021

manish-sethi left a comment

manish-sethi Jul 1, 2021

manish-sethi Jul 1, 2021

jkneubuh Jul 2, 2021

manish-sethi Jul 1, 2021

manish-sethi Jul 1, 2021

manish-sethi Jul 1, 2021

manish-sethi Jul 1, 2021

yacovm commented Jul 1, 2021

manish-sethi left a comment

manish-sethi Jul 7, 2021

manish-sethi Jul 7, 2021

		return errors.Wrap(err, "as another peer node command is executing,"+
		" wait for that command to complete its execution or terminate it before retrying")

		require.Error(t, UnjoinChannel(conf, ledgerID),
		"as another peer node command is executing, wait for that command to complete its execution or terminate it before retrying")

		logger.Infow("channel has been successfully unjoined", "ledgerID", ledgerID)
		return nil

		// UnjoinChannel removes the data for a ledger and sets the status to UNDER_DELETION. This function is to be
		// invoked while the peer is shut down.

[FAB-11334] Adds a new 'peer node unjoin' feature #2732

[FAB-11334] Adds a new 'peer node unjoin' feature #2732

Conversation

jkneubuh commented Jul 1, 2021

Type of change

Description

Related issues

yacovm commented Jul 1, 2021

jkneubuh commented Jul 1, 2021

manish-sethi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yacovm commented Jul 1, 2021

manish-sethi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment