Persist the cluster nodes info after applying the cluster topology #1219

git-hulk · 2023-01-07T08:17:13Z

This closes #1021

Currently, the cluster nodes' info is only stored in memory and we need to re-sync the cluster topology after restarting. It's very inconvenient and confusing for most users.

Solutoin

Persist the cluster nodes' info in the local disk if the topology was changed. The file location is {config->dir}/nodes.conf and the format is below:

version 2
id 07c37dfeb235213a872192d90877d0cd55635b92
node 07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1 59129 master -  1-2 4-8193 10000 10002-11002 16381-16383
node 07c37dfeb235213a872192d90877d0cd55635b92 127.0.0.1 59137 master -  0

Load and parse the nodes.conf if the cluster mode is enabled and the nodes file is exists

src/config/config.h

src/cluster/cluster.cc

src/commands/redis_cmd.cc

PragmaTwice · 2023-01-09T01:02:01Z

I have some comments on the file format.

I think it is a new file format that relies on what is in the comments (starting with #) as part of the parsing, which is puzzling. I think we can try not to create a new file format as well as a new parsing logic. If this file is named *.conf (it is weird to have two different file format in one program that both have suffix .conf), I think we could better refer to the previous kvrocks conf file format, i.e.

# a comment that does not affect parsing...
version 1
id 0123456789012345678901234567890123456789
node 07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1 63262 master -  0-2 4-8193 10000 10002-11002 16381-16383
node ...

And there are some util functions in config/config_util.h that can help parsing text in such file format.

git-hulk · 2023-01-09T11:03:54Z

I have some comments on the file format.

I think it is a new file format that relies on what is in the comments (starting with #) as part of the parsing, which is puzzling. I think we can try not to create a new file format as well as a new parsing logic. If this file is named *.conf (it is weird to have two different file format in one program that both have suffix .conf), I think we could better refer to the previous kvrocks conf file format, i.e.
# a comment that does not affect parsing...
version 1
id 0123456789012345678901234567890123456789
node 07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1 63262 master -  0-2 4-8193 10000 10002-11002 16381-16383
node ...
And there are some util functions in config/config_util.h that can help parsing text in such file format.

Yes, it'd be better to keep the same format, I will reconsider if it's other issues.

mapleFU

And still I have two questions:

do we have spec on the conf? @PragmaTwice 's advice is ok, and why don't we using something like JSON? It's our standard compatible with other format?
Should we ignore some "stale" config? What should we do if a server is crashed?

src/cluster/cluster.cc

src/server/server.cc

mapleFU

If any bad record or new format is supported, the code running on old server would reject the format, and failed to start.

So, why do we failed when LoadClusterNodes failed?

src/cluster/cluster.cc

src/server/server.cc

src/commands/redis_cmd.cc

mapleFU · 2023-01-09T12:52:24Z

src/cluster/cluster.cc

+        nodesInfo.append(line + "\n");
+        break;
+      default:
+        return {Status::NotOK, "got unknown parse state"};


In the future, if we support other states, seems load would fail?

It should make sense that the older version can't parse new state.

It should make sense that the older version can't parse new state.

It's ok but, if a user want to rollback(maybe because some other reason), and the protocol is updated. He will failed to start, unless he delete the file.

Yes, I have no strong point in this scenario. My initial intention is to prevent unexpected configurations.

To see other folks have any advice? @PragmaTwice @ShooterIT @torwig

torwig · 2023-01-09T16:14:26Z

I have a suspicion that all the members of the Cluster instance are not thread-safe, e.g. simultaneous calls of SetClusterNodes and GetClusterNodes can lead to a data race. Am I right?

git-hulk · 2023-01-10T01:55:41Z

I have a suspicion that all the members of the Cluster instance are not thread-safe, e.g. simultaneous calls of SetClusterNodes and GetClusterNodes can lead to a data race. Am I right?

Yes, we should reconsider making those commands exclusive.

ShooterIT · 2023-01-10T02:03:35Z

all cluster writing commands have exclusive, so they are safe now.

ShooterIT · 2023-01-10T02:05:14Z

is there a way which allow we persist the cluster topology into some rocksdb CF?

ShooterIT · 2023-01-10T02:10:51Z

another issue, since after rebooting, replica would load topology and try to sync with master, so when master receive the sync request from slave, master should check if the replica is in its current cluster topology, if yes, master allow, if no, it should reject.

git-hulk · 2023-01-10T04:16:43Z

another issue, since after rebooting, replica would load topology and try to sync with master, so when master receive the sync request from slave, master should check if the replica is in its current cluster topology, if yes, master allow, if no, it should reject.

We have no node id or port in the replication process, so it cannot identify the replica and reject it on the master side. But we can compare the cluster version before connecting on the replica side.

git-hulk · 2023-01-10T04:20:49Z

all cluster writing commands have exclusive, so they are safe now.

My bad, I forget the cluster and clusterx will be in exclusive mode even if it has no exclusive flag.

git-hulk · 2023-01-10T04:25:50Z

is there a way which allow we persist the cluster topology into some rocksdb CF?

I think it's unnecessary, the local file will make it easier to modify or drop it manually.

PragmaTwice · 2023-01-11T05:56:59Z

It seems CI was not triggered. We can make an empty commit to retry.

git-hulk · 2023-01-11T07:30:55Z

It seems CI was not triggered. We can make an empty commit to retry.

ok

git-hulk · 2023-01-12T09:40:00Z

@ShooterIT To make the context clear, I'll merge this PR first, then file another PR to enhance the replication.

git-hulk · 2023-01-12T09:42:37Z

Thanks all, merging...

git-hulk changed the title ~~WIP: Feature/persist cluster info~~ WIP: persist the cluster nodes info Jan 7, 2023

torwig reviewed Jan 7, 2023

View reviewed changes

src/config/config.h Outdated Show resolved Hide resolved

torwig reviewed Jan 7, 2023

View reviewed changes

src/cluster/cluster.cc Show resolved Hide resolved

src/cluster/cluster.cc Outdated Show resolved Hide resolved

git-hulk added 4 commits January 7, 2023 23:23

Implement the load/dump nodes info for cluster mode

771c54c

Add cpp unit test for the load/dump cluster nodes info

03e3e18

Add go test case

ca7bd6c

Merge branch 'unstable' into feature/persist-cluster-info

33748f9

git-hulk marked this pull request as ready for review January 8, 2023 04:12

git-hulk changed the title ~~WIP: persist the cluster nodes info~~ Persist the cluster nodes info after applying the cluster topology Jan 8, 2023

git-hulk requested review from torwig, PragmaTwice and ShooterIT January 8, 2023 04:24

torwig reviewed Jan 8, 2023

View reviewed changes

src/commands/redis_cmd.cc Outdated Show resolved Hide resolved

mapleFU reviewed Jan 9, 2023

View reviewed changes

src/cluster/cluster.cc Show resolved Hide resolved

src/server/server.cc Show resolved Hide resolved

mapleFU reviewed Jan 9, 2023

View reviewed changes

src/cluster/cluster.cc Show resolved Hide resolved

src/cluster/cluster.cc Outdated Show resolved Hide resolved

src/cluster/cluster.cc Show resolved Hide resolved

src/server/server.cc Show resolved Hide resolved

src/commands/redis_cmd.cc Outdated Show resolved Hide resolved

mapleFU reviewed Jan 9, 2023

View reviewed changes

git-hulk added 2 commits January 9, 2023 21:39

Use unify config file format

f19b951

Fix naming style

de43a68

git-hulk requested review from torwig and mapleFU and removed request for mapleFU January 9, 2023 15:25

torwig approved these changes Jan 9, 2023

View reviewed changes

Merge branch 'unstable' into feature/persist-cluster-info

42168ba

PragmaTwice approved these changes Jan 12, 2023

View reviewed changes

git-hulk merged commit f1f7c10 into apache:unstable Jan 12, 2023

git-hulk mentioned this pull request Jan 12, 2023

Check if the replication is correct by the topology in cluster mode #1224

Open

2 tasks

git-hulk mentioned this pull request Jul 28, 2023

Kvrocks crashed in Cluster::updateSlotsInfo() #1598

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persist the cluster nodes info after applying the cluster topology #1219

Persist the cluster nodes info after applying the cluster topology #1219

git-hulk commented Jan 7, 2023 •

edited

Loading

PragmaTwice commented Jan 9, 2023 •

edited

Loading

git-hulk commented Jan 9, 2023

mapleFU left a comment

mapleFU left a comment

mapleFU Jan 9, 2023

git-hulk Jan 9, 2023

mapleFU Jan 9, 2023

git-hulk Jan 9, 2023

git-hulk Jan 9, 2023

torwig commented Jan 9, 2023

git-hulk commented Jan 10, 2023

ShooterIT commented Jan 10, 2023

ShooterIT commented Jan 10, 2023

ShooterIT commented Jan 10, 2023

git-hulk commented Jan 10, 2023

git-hulk commented Jan 10, 2023 •

edited

Loading

git-hulk commented Jan 10, 2023

PragmaTwice commented Jan 11, 2023 •

edited

Loading

git-hulk commented Jan 11, 2023

git-hulk commented Jan 12, 2023

git-hulk commented Jan 12, 2023

Persist the cluster nodes info after applying the cluster topology #1219

Persist the cluster nodes info after applying the cluster topology #1219

Conversation

git-hulk commented Jan 7, 2023 • edited Loading

Solutoin

PragmaTwice commented Jan 9, 2023 • edited Loading

git-hulk commented Jan 9, 2023

mapleFU left a comment

Choose a reason for hiding this comment

mapleFU left a comment

Choose a reason for hiding this comment

mapleFU Jan 9, 2023

Choose a reason for hiding this comment

git-hulk Jan 9, 2023

Choose a reason for hiding this comment

mapleFU Jan 9, 2023

Choose a reason for hiding this comment

git-hulk Jan 9, 2023

Choose a reason for hiding this comment

git-hulk Jan 9, 2023

Choose a reason for hiding this comment

torwig commented Jan 9, 2023

git-hulk commented Jan 10, 2023

ShooterIT commented Jan 10, 2023

ShooterIT commented Jan 10, 2023

ShooterIT commented Jan 10, 2023

git-hulk commented Jan 10, 2023

git-hulk commented Jan 10, 2023 • edited Loading

git-hulk commented Jan 10, 2023

PragmaTwice commented Jan 11, 2023 • edited Loading

git-hulk commented Jan 11, 2023

git-hulk commented Jan 12, 2023

git-hulk commented Jan 12, 2023

git-hulk commented Jan 7, 2023 •

edited

Loading

PragmaTwice commented Jan 9, 2023 •

edited

Loading

git-hulk commented Jan 10, 2023 •

edited

Loading

PragmaTwice commented Jan 11, 2023 •

edited

Loading