Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

in-memory federation transaction transmission queues build up indefinitely for offline servers #7828

Closed
richvdh opened this issue Jul 13, 2020 · 5 comments · Fixed by #7864
Closed
Assignees
Labels
A-Performance Performance, both client-facing and admin-facing

Comments

@richvdh
Copy link
Member

richvdh commented Jul 13, 2020

when a server is offline, any events that we would have sent to it stack up in memory indefinitely.

See, for example, this graph for matrix.org:

image

Apart from the memory usage concerns, this means that if a server goes offline for 1000 events, then when it comes back we will try to send it all those 1000 events in turn, rather than just sending it the most recent ones in a room.

@richvdh
Copy link
Member Author

richvdh commented Jul 13, 2020

I suspect this was introduced way back in #2064.

I think we just need to clear the in-memory queues when handling a blacklisted server in _transaction_transmission_loop.

@richvdh
Copy link
Member Author

richvdh commented Jul 13, 2020

I'm chucking this into the todo list as it appears an obvious precursor to #2528.

@erikjohnston erikjohnston added maintenance A-Performance Performance, both client-facing and admin-facing labels Jul 13, 2020
reivilibre added a commit to matrix-org/sygnal that referenced this issue Jul 14, 2020
this morning: More Sygnal#130 (HTTP proxy) rework, I'm feeling it
straightening out a lot so hopefully it'll be back in the queue soon;

today: 'More of that'; Catch up on #2528 and #7828 which Riiich has
suggested solving first
Github
matrix-org/synapse#2528 : Homeservers don't
catch up with missed traffic until someone sends another event
matrix-org/synapse#7828 : in-memory federation
transaction transmission queues build up indefinitely for offline
servers
@deepbluev7
Copy link
Contributor

deepbluev7 commented Jul 14, 2020

This may be the reason for #7176 since it was reported shortly after #2064 was merged (didn't look at the year). For reference here's my federation_sender gc times graph:

grafik

grafik

@tulir
Copy link
Member

tulir commented Jul 14, 2020

This graph looks like the queue build-up was happening without causing any GC leak:

image

reivilibre added a commit that referenced this issue Jul 16, 2020
Fixes #7828.

Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
@reivilibre
Copy link
Contributor

Will land this in a less aggressive way, but will make more aggressive when #2528 is solved because it will be safe to do so.

reivilibre added a commit that referenced this issue Aug 13, 2020
…e. (#7864)

* Empty federation transmission queues when we are backing off.

Fixes #7828.

Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>

* Address feedback

Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>

* Reword newsfile
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Performance Performance, both client-facing and admin-facing
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants