Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Running redis mode (on develop) causes high CPU usage #7334

Closed
Half-Shot opened this issue Apr 23, 2020 · 5 comments
Closed

Running redis mode (on develop) causes high CPU usage #7334

Half-Shot opened this issue Apr 23, 2020 · 5 comments
Labels
A-Workers Problems related to running Synapse in Worker Mode (or replication)

Comments

@Half-Shot
Copy link
Collaborator

The CPU graph for my set of workers:
image

The GC graph:
image

Something has changed between 1.12.3 and 2e3b9a0 that has caused this to skyrocket.

Otherwise, the homeserver seems to work in terms of federating and sending messages. It's just obsessed with GCing right now.

Memory usage has remained a bit low:

image

@Half-Shot
Copy link
Collaborator Author

CPU usage by requests:

image

@erikjohnston
Copy link
Member

I suspect its due to the fact that it seems to be sending out about 1kHz of REMOTE_SERVER_UP commands from each process. Possibly there is a loop going on?

@Half-Shot
Copy link
Collaborator Author

Switching off Redis (but running the same commit) seemed to reduce CPU, and stop spamming REMOTE_SERVER_UP.

@erikjohnston
Copy link
Member

I believe the cause of this is if the fact in current TCP mode if a worker detects a remote server has come back online it sends a REMOTE_SERVER_UP to master, which then proxies to other workers. When running with redis the master process still echoes the command back, which leads to an infinite loop as redis will echo it back to master again.

erikjohnston added a commit that referenced this issue Apr 27, 2020
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.

The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.

Fixes #7334.
@Half-Shot
Copy link
Collaborator Author

I can confirm #7352 fixes the issue for me.

erikjohnston added a commit that referenced this issue Apr 29, 2020
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.

The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.

Fixes #7334.
@richvdh richvdh closed this as completed Apr 29, 2020
phil-flex pushed a commit to phil-flex/synapse that referenced this issue Jun 16, 2020
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.

The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.

Fixes matrix-org#7334.
@richvdh richvdh added the A-Workers Problems related to running Synapse in Worker Mode (or replication) label Feb 16, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Workers Problems related to running Synapse in Worker Mode (or replication)
Projects
None yet
Development

No branches or pull requests

3 participants