Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Abort startup if unsafe DB locale #12055

Closed
JanZerebecki opened this issue Feb 22, 2022 · 6 comments · Fixed by #12262
Closed

Abort startup if unsafe DB locale #12055

JanZerebecki opened this issue Feb 22, 2022 · 6 comments · Fixed by #12262
Assignees
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.

Comments

@JanZerebecki
Copy link

Description

From #10718 :

Perhaps instead of a warning that should be made to abort startup unless a new setting corrup_my_db_silently_on_libc_update: true is set.

We're fine going with this approach to at least help people avoid silent corruption... details tbd.

We should probably also add the commands needed to drop the broken data from a DB to https://matrix-org.github.io/synapse/latest/postgres.html#fixing-incorrect-collate-or-ctype and also add:

We should also probably link to information on how Glibc collation changes can break PostgreSQL. E.g., https://wiki.postgresql.org/wiki/Locale_data_changes

Steps to reproduce

  • have DB use wrong locale
  • update libc with locale change
  • notice corrupt DB

Version information

  • Version: seen e.g. on 1.41.0, but bug exists before and after

  • Install method: any

  • Platform: Linux glibc
@erikjohnston
Copy link
Member

It's worth noting that we should abort for new deployments.

But I agree that we may want to start being more aggressive to try and force existing deployments to fix themselves.

@erikjohnston erikjohnston changed the title abort startup if unsafe DB locale Abort startup if unsafe DB locale Feb 23, 2022
@erikjohnston erikjohnston added the T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks. label Feb 23, 2022
@JanZerebecki
Copy link
Author

It is also worth it for existing deployments, because it happens that this becomes wrong after system upgrades (of the DB machines, not necessarily the same machines that are running synapse), DB upgrades or DB migrations.
My idea was not to force them, but require acknowledgement from an administrator by setting a configuration setting, which we should mention in the error message when aborting. As I know of multiple people, myself included, who didn't catch the warning in the log.

@callahad
Copy link
Contributor

callahad commented Feb 24, 2022

We should do this for existing deployments in an upcoming release. @jaywink you'll want to watch for when this gets resolved, as it may require config changes for EMS.

@johansmitsnl
Copy link

A live upgrade would be done like this?:

update pg_database set datcollate='C', datctype='C' where datname='synapse';
REINDEX DATABASE synapse;

@squahtx
Copy link
Contributor

squahtx commented Apr 6, 2022

Those commands most likely do the right thing, but we can't guarantee that modifying postgres system tables like that is 100% safe. It might be a good idea to take a backup before trying it.

The safest method would be to dump the database and recreate it with the correct locale.

We recommend visiting #synapse:matrix.org for help and support regarding Synapse.

@callahad
Copy link
Contributor

callahad commented Apr 6, 2022

It may be worth linking to https://wiki.postgresql.org/wiki/Locale_data_changes and https://wiki.postgresql.org/wiki/Collations which have in-depth treatment of the issues which can occur when glibc updates its locale data, and what to do about them.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
T-Task Refactoring, removal, replacement, enabling or disabling functionality, other engineering tasks.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants