Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Persist the cluster nodes info after applying the cluster topology #1219

Merged
merged 7 commits into from
Jan 12, 2023
Merged

Persist the cluster nodes info after applying the cluster topology #1219

merged 7 commits into from
Jan 12, 2023

Conversation

git-hulk
Copy link
Member

@git-hulk git-hulk commented Jan 7, 2023

This closes #1021

Currently, the cluster nodes' info is only stored in memory and we need to re-sync the cluster topology after restarting. It's very inconvenient and confusing for most users.

Solutoin

  • Persist the cluster nodes' info in the local disk if the topology was changed. The file location is {config->dir}/nodes.conf and the format is below:
version 2
id 07c37dfeb235213a872192d90877d0cd55635b92
node 07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1 59129 master -  1-2 4-8193 10000 10002-11002 16381-16383
node 07c37dfeb235213a872192d90877d0cd55635b92 127.0.0.1 59137 master -  0
  • Load and parse the nodes.conf if the cluster mode is enabled and the nodes file is exists

@git-hulk git-hulk changed the title WIP: Feature/persist cluster info WIP: persist the cluster nodes info Jan 7, 2023
src/config/config.h Outdated Show resolved Hide resolved
src/cluster/cluster.cc Show resolved Hide resolved
src/cluster/cluster.cc Outdated Show resolved Hide resolved
@git-hulk git-hulk marked this pull request as ready for review January 8, 2023 04:12
@git-hulk git-hulk changed the title WIP: persist the cluster nodes info Persist the cluster nodes info after applying the cluster topology Jan 8, 2023
src/commands/redis_cmd.cc Outdated Show resolved Hide resolved
@PragmaTwice
Copy link
Member

PragmaTwice commented Jan 9, 2023

I have some comments on the file format.

I think it is a new file format that relies on what is in the comments (starting with #) as part of the parsing, which is puzzling. I think we can try not to create a new file format as well as a new parsing logic. If this file is named *.conf (it is weird to have two different file format in one program that both have suffix .conf), I think we could better refer to the previous kvrocks conf file format, i.e.

# a comment that does not affect parsing...
version 1
id 0123456789012345678901234567890123456789
node 07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1 63262 master -  0-2 4-8193 10000 10002-11002 16381-16383
node ...

And there are some util functions in config/config_util.h that can help parsing text in such file format.

@git-hulk
Copy link
Member Author

git-hulk commented Jan 9, 2023

I have some comments on the file format.

I think it is a new file format that relies on what is in the comments (starting with #) as part of the parsing, which is puzzling. I think we can try not to create a new file format as well as a new parsing logic. If this file is named *.conf (it is weird to have two different file format in one program that both have suffix .conf), I think we could better refer to the previous kvrocks conf file format, i.e.

# a comment that does not affect parsing...
version 1
id 0123456789012345678901234567890123456789
node 07c37dfeb235213a872192d90877d0cd55635b91 127.0.0.1 63262 master -  0-2 4-8193 10000 10002-11002 16381-16383
node ...

And there are some util functions in config/config_util.h that can help parsing text in such file format.

Yes, it'd be better to keep the same format, I will reconsider if it's other issues.

Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And still I have two questions:

  1. do we have spec on the conf? @PragmaTwice 's advice is ok, and why don't we using something like JSON? It's our standard compatible with other format?
  2. Should we ignore some "stale" config? What should we do if a server is crashed?

src/cluster/cluster.cc Show resolved Hide resolved
src/server/server.cc Show resolved Hide resolved
Copy link
Member

@mapleFU mapleFU left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If any bad record or new format is supported, the code running on old server would reject the format, and failed to start.

So, why do we failed when LoadClusterNodes failed?

src/cluster/cluster.cc Show resolved Hide resolved
src/cluster/cluster.cc Outdated Show resolved Hide resolved
src/cluster/cluster.cc Show resolved Hide resolved
src/server/server.cc Show resolved Hide resolved
src/commands/redis_cmd.cc Outdated Show resolved Hide resolved
nodesInfo.append(line + "\n");
break;
default:
return {Status::NotOK, "got unknown parse state"};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future, if we support other states, seems load would fail?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should make sense that the older version can't parse new state.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should make sense that the older version can't parse new state.

It's ok but, if a user want to rollback(maybe because some other reason), and the protocol is updated. He will failed to start, unless he delete the file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have no strong point in this scenario. My initial intention is to prevent unexpected configurations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To see other folks have any advice? @PragmaTwice @ShooterIT @torwig

@git-hulk git-hulk requested review from torwig and mapleFU and removed request for mapleFU January 9, 2023 15:25
@torwig
Copy link
Contributor

torwig commented Jan 9, 2023

I have a suspicion that all the members of the Cluster instance are not thread-safe, e.g. simultaneous calls of SetClusterNodes and GetClusterNodes can lead to a data race. Am I right?

@git-hulk
Copy link
Member Author

I have a suspicion that all the members of the Cluster instance are not thread-safe, e.g. simultaneous calls of SetClusterNodes and GetClusterNodes can lead to a data race. Am I right?

Yes, we should reconsider making those commands exclusive.

@ShooterIT
Copy link
Member

all cluster writing commands have exclusive, so they are safe now.

@ShooterIT
Copy link
Member

is there a way which allow we persist the cluster topology into some rocksdb CF?

@ShooterIT
Copy link
Member

another issue, since after rebooting, replica would load topology and try to sync with master, so when master receive the sync request from slave, master should check if the replica is in its current cluster topology, if yes, master allow, if no, it should reject.

@git-hulk
Copy link
Member Author

another issue, since after rebooting, replica would load topology and try to sync with master, so when master receive the sync request from slave, master should check if the replica is in its current cluster topology, if yes, master allow, if no, it should reject.

We have no node id or port in the replication process, so it cannot identify the replica and reject it on the master side. But we can compare the cluster version before connecting on the replica side.

@git-hulk
Copy link
Member Author

git-hulk commented Jan 10, 2023

all cluster writing commands have exclusive, so they are safe now.

My bad, I forget the cluster and clusterx will be in exclusive mode even if it has no exclusive flag.

@git-hulk
Copy link
Member Author

is there a way which allow we persist the cluster topology into some rocksdb CF?

I think it's unnecessary, the local file will make it easier to modify or drop it manually.

@PragmaTwice
Copy link
Member

PragmaTwice commented Jan 11, 2023

It seems CI was not triggered. We can make an empty commit to retry.

@git-hulk
Copy link
Member Author

It seems CI was not triggered. We can make an empty commit to retry.

ok

@git-hulk
Copy link
Member Author

@ShooterIT To make the context clear, I'll merge this PR first, then file another PR to enhance the replication.

@git-hulk
Copy link
Member Author

Thanks all, merging...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Persist the cluster topology after applying
5 participants