Synapse workers #456

eMPee584 · 2020-04-14T16:47:31Z

· first version: grokking this was a bit of 🤯 .. and there are probably still faults in the wiring
· ⚠️ ~~not tested impact on UX yet~~
· update 2020/04/21: live testing on our 7k+ uni matrix instance 😅 💦
· update 2020/06/30: works very well to spread load, now with 10k+ users
· update 2020/09/09: currently working on expanding the patch queue to cover recent changes to synapse workers.. not much left but it might need another week.
· update 2020/09/29: about to merge with Max's contributions in #642 PR ✊♻️
· update 2020/10/14: ok I've merged and mostly reconciled Max's & my branch (this PR) & with MDAD* mainline, now working to get the frigging PIDs from the worker processes and fixing the removal of leftovers
· update 2020/10/28: nearly finished, only missing support for setups not using inbuilt nginx.. REVIEWS, PLEASE
· update 2021/01/08: yes, still nearly finished 😁

(hey @spantaleev this project needs an official abbrevation/nick name lol)

eMPee584 · 2020-04-19T08:26:16Z

Turns out, even with a when condition on an import_tasks action evaluating to false, loops within these tasks will be executed. How !clever a design choice is that. 🥳

[ok, further reading revealed the distinctive logic behind import_tasks and include_tasks .. c.f. Server Fault - What's the difference between include_tasks and import_tasks? ]

· needs documentation; no checks yet for port clashes or typos in worker name · according to https://github.com/matrix-org/synapse/wiki/Workers-setup-with-nginx#results about 90% of requests go to the synchrotron endpoint · thus, the synchrotron worker is especially suited to be load-balanced · most of the other workers are documented to support only a single instance · https://github.com/matrix-org/synapse/blob/master/docs/workers.md

· 😅 How to keep this in sync with the matrix-synapse documentation? · regex location matching is expensive · nginx syntax limit: one location only per block / statement · thus, lots of duplicate statements in this file

FIXME: horrid duplication in template file

eMPee584 · 2020-04-20T14:11:58Z

@spantaleev this template file has a lot of duplicate statements.. I had started to factor some of that out into a {% macro render_proxy_directives() %}, but oh the indentation .. and oh the TABs... this would have been nice: pallets/jinja#919 🤔
We tried it on our test instance though and it seems to work fine. Still waiting for a colleague to provide us with his load testing script..

spantaleev · 2020-04-22T07:03:12Z

One of the things that grabs my attention is the use of matrix_synapse_* variables in the matrix-nginx-proxy role (e.g. matrix_synapse_use_presence, matrix_nginx_proxy_synapse_workers_list).

We try to keep roles as separate as possible and to instead pass data along via group_vars/matrix_servers.

The easiest way to fix that would probably be by defining a new default like this in the matrix-nginx-proxy role:

# List of Synapse workers.
# Example: [{"worker": "federation_reader", "port": 5000}]
matrix_nginx_proxy_synapse_workers_list: []

.. and then connecting it to matrix_synapse_workers_enabled_list from group_vars/matrix_servers:

matrix_nginx_proxy_synapse_workers_list: "{{ matrix_synapse_workers_enabled_list }}"

We probably don't need to pass matrix_synapse_use_presence to matrix-nginx-proxy. It doesn't seem to hurt having a rule capturing location ~ ^/_matrix/client/(api/v1|r0|unstable)/presence/[^/]+/status and forwarding it to Synapse in any case (even if presence is disabled).

I'm guessing when workers are enabled, we don't support matrix_nginx_proxy_enabled: false?
If the proxy is disabled, we attempt to generate a configuration that can be used with a local nginx server (one not running in a container). In such a case, we tell other services to expose their ports on 127.0.0.1 and we reverse-proxy to those ports, and not to container-name:port.

I guess making workers work with this would be complicated.

I see that you've modified roles/matrix-synapse/templates/synapse/systemd/matrix-synapse.service.j2 to expose all worker ports (on 0.0.0.0 even). Is it necessary to do this? We may wish to expose them on 127.0.0.1 in some rare cases (like matrix_nginx_proxy_enabled: false), but I don't see why we'd do it on 0.0.0.0 all of the time.

I'm guessing that when workers are enabled, matrix-corporal likely breaks, as it's no longer capturing all traffic. Rather, some traffic gets captured in these new location blocks and goes to Synapse workers instead.

Using matrix-corporal + Synapse workers is some advanced use-case that will be difficult to solve. You should ignore it. I'm just mentioning it here for completeness.

lovelaced · 2020-04-23T16:58:58Z

I don't have anything to add here, but just wanted to add this would be massively useful to my organization, and workers are a REALLY important feature. Would love to be able to use the officially supported ansible playbook for this rather than throwing something together.

eMPee584 · 2020-04-25T21:14:50Z

( ok thanks for the feedback.. this will need couple more days to get right.. also, we're hitting some issues, ~~matrix-org/synapse#7345~~ [issues were resolved by updating to synapse 1.12.4 👍 😅 ] .. so for the time being, it's strictly for the bold and coureagous 😅 )

ptman · 2020-06-03T14:37:03Z

@eMPee584 any update? smooth sailing?

eMPee584 · 2020-06-14T20:16:45Z

Ah, sorry, I was busy with school the recent weeks and the convoluted nginx template also made me avoid finally tackling this. I'll try and get it done in the coming week..
Yes, it does work very well.

spantaleev · 2020-06-14T20:20:43Z

Ah, good to here you're still on it! Looking forward to merging this!

roles/matrix-synapse/templates/synapse/worker.yaml.j2

lovelaced · 2020-07-07T15:00:04Z

roles/matrix-nginx-proxy/templates/nginx/conf.d/matrix-synapse.conf.j2

+	{% set federation_reader_worker = matrix_synapse_workers_enabled_list|selectattr('worker', 'equalto', 'federation_reader')|first %}
+	{% if federation_reader_worker %}
+	{# c.f. https://github.com/matrix-org/synapse/blame/master/docs/workers.md#L160 #}
+	location ~ ^(/_matrix/federation/v1/event/|/_matrix/federation/v1/state/|/_matrix/federation/v1/state_ids/|/_matrix/federation/v1/backfill/|/_matrix/federation/v1/get_missing_events/|/_matrix/federation/v1/publicRooms|/_matrix/federation/v1/query/|/_matrix/federation/v1/make_join/|/_matrix/federation/v1/make_leave/|/_matrix/federation/v1/send_join/|/_matrix/federation/v2/send_join/|/_matrix/federation/v1/send_leave/|/_matrix/federation/v2/send_leave/|/_matrix/federation/v1/invite/|/_matrix/federation/v2/invite/|/_matrix/federation/v1/query_auth/|/_matrix/federation/v1/event_auth/|/_matrix/federation/v1/exchange_third_party_invite/|/_matrix/federation/v1/user/devices/|/_matrix/federation/v1/send/|/_matrix/federation/v1/get_groups_publicised$|/_matrix/key/v2/query|/_matrix/federation/v1/groups/) {


/_matrix/federation/v1/groups/ should only be allowed GET requests; this configuration will cause federated community invites to fail. reference here: https://github.com/matrix-org/synapse/blob/master/docs/workers.md#synapseappfederation_reader

Thanks for the feedback.. Is this theoretical or have you experienced it on an actual system?
I think there probably is a different mistake in the config to cause this.. we are also having some minor troubles with our setup.
I've just created an awk script to parse the endpoints directly from the upstream synapse workers documentation.. but it remains kinda nightmarish to work with.. 🤔

Yes, I experienced it myself when trying to set up a fresh homeserver so I just removed the groups endpoint so the main synapse thread is handling it instead for now.

lovelaced · 2020-07-13T18:41:18Z

roles/matrix-nginx-proxy/templates/nginx/conf.d/matrix-synapse.conf.j2

+	{% set client_reader_worker = matrix_synapse_workers_enabled_list|selectattr('worker', 'equalto', 'client_reader')|first %}
+	{% if client_reader_worker %}
+	{# c.f. https://github.com/matrix-org/synapse/blame/master/docs/workers.md#L252 #}
+	location ^/_matrix/client/(versions$|(api/v1|r0|unstable)/(publicRooms$|rooms/.*/joined_me|rooms/.*/context/.|rooms/.*/members$|rooms/.*/messages$|rooms/.*/state$|login$|account/3pid$|keys/query$|keys/changes$|voip/turnServer$|joined_groups$|publicised_groups$|publicised_groups/|pushrules/.*$|groups/.*$|register$|auth/.*/fallback/web$)) {


Two of the endpoints here can only handle GET requests, and there are also two more not included here. The following list is GET-only for the client_reader worker:

^/_matrix/client/(api/v1|r0|unstable)/pushrules/.*$ ^/_matrix/client/(api/v1|r0|unstable)/groups/.*$ ^/_matrix/client/(api/v1|r0|unstable)/user/[^/]*/account_data/ ^/_matrix/client/(api/v1|r0|unstable)/user/[^/]*/rooms/[^/]*/account_data/

The "location ^/..." here, for a regex match needs to be "location ~ ^..." doesn't it?

roles/matrix-synapse/templates/synapse/systemd/matrix-synapse-worker@.service.j2

maxklenk · 2020-08-27T08:41:16Z

Redis support was added in v1.13.0 with it becoming the recommended method in v1.18.0.
https://github.com/matrix-org/synapse/blob/master/docs/workers.md

The workers in the current synapse version now communicate via Redis, so we should setup Redis as well to add worker support.

`::` leads to errors like: > socket.gaierror: [Errno -9] Address family for hostname not supported

This leads to much easier management and potential safety features (validation). In the future, we could try to avoid port conflicts as well, but it didn't seem worth the effort to do it now. Our port ranges seem large enough. This can also pave the way for a "presets" feature (similar to `matrix_nginx_proxy_ssl_presets`) which makes it even easier for people to configure worker counts.

spantaleev · 2021-02-15T09:45:03Z

85a05f3 added support for dynamically generating matrix_synapse_workers_enabled_list based on other variables.
I believe that having this makes it much easier for people to configure workers.

I think having some presets might be a useful feature that can be added easily now. Most people would then be able to just switch a single other variable (matrix_synapse_workers_preset) which controls all these many other matrix_synapse_workers_*_workers_count variables.

It'd be interesting to brainstorm what some good presets would be. I guess it could be like this:

minimal - a very minimal (but still useful) one is to just enable a federation_sender worker and leave all other workers disabled
one-of-each - the next preset could be "one of each" worker type
larger - after that, there could be one that is like "one of each", but has multiple generic workers (3 or 5, for example)

Note that we currently don't support running multiple federation_sender workers (see 61e427d), so we can't increase the number of those right now. We may consider fixing this.

People that need more workers than these 3 presets can go for a hybrid approach: start with a preset (say "one of each") and then increase the number of a certain worker that they need more of (e.g. matrix_synapse_workers_generic_workers_count: 15).

Another question is, which should be the default preset? I think one-of-each is most helpful and reasonable. People that enable workers are obviously on a somewhat bigger server, so going for the middle option is good. People who want to save resources can go the extra mile and go for minimal or for additional tweaking of matrix_synapse_workers_*_workers_count variables. People who want a bigger worker setup can also use one-of-each and tweak the variables that seem worth tweaking.

Do you have other ideas for useful presets? Or comments about these ones? Does having a preset option sound useful at all or should we just go for "one of each" by default and ask people to tweak the worker counts?

ptman · 2021-02-15T11:18:37Z

It would be great to get (voluntary) feedback on what actual worker counts end up being popular. One could start with one, or no, presets and add more as patterns are identified.

Related to: - matrix-org/synapse#9407 - matrix-org/synapse#7855

This give us the possibility to run multiple instances of workers that that don't expose a port. Right now, we don't support that, but in the future we could run multiple `federation_sender` or `pusher` workers, without them fighting over naming (previously, they'd all be named something like `matrix-synapse-worker-pusher-0`, because they'd all define `port` as `0`).

spantaleev · 2021-02-18T16:26:54Z

I've made a breaking change to the format of matrix_synapse_workers_enabled_list in d6c4d41.

People that define that manually will need to add an instanceId property to each worker instance.

In any case, it's better to refrain from defining matrix_synapse_workers_enabled_list, if possible in favor of matrix_synapse_workers_*_workers_count variables.

We're talking about a webserver running on the same machine, which imports the configuration files generated by the `matrix-nginx-proxy` in the `/matrix/nginx-proxy/conf.d` directory. Users who run an nginx webserver on some other machine will need to do something different.

Adding more presets in the future would be nice.

spantaleev · 2021-02-19T09:49:06Z

I think we have a cause for celebration! This is now complete and merged to master! 🎉

Huge thanks to everyone who has worked on making this happen!

There are minor things that could be improved.

For example, pusher and federation_sender workers can only run as a single instance (at most) for us. This is solvable, but requires more work and testing. It's probably not too important as well, given that TU Dresden scaled to 10k+ even without it.

Another thing that may be improved is metrics. Since #838, we do support metrics collection. When workers are enabled, graphs become somewhat sparse. I'm guessing we'd need to collect data from the various workers when workers are enabled. Maybe we need some different graphs (splitting them per worker or something?).

ptman · 2021-02-19T09:56:07Z

matrix-org/synapse#7773 https://github.com/matrix-org/synapse/blob/develop/docs/metrics-howto.md#monitoring-workers

jolly-jump · 2021-02-19T11:06:34Z

Congratulations to those who worked on this! This saved my butt. We have 1k users only, but it seems this was one necessary step.

I have two other questions:

nginx worker connections
postgres worker connections

They are not handled by this configuration of workers, I suppose. But, if I tune postgres to use "max_parallel_workers" as proposed here #532 and if I use more nginx worker connections by tampering with /matrix/nginx-proxy/nginx.conf is it wise to count all the workers together to avoid too much overprovisioning of CPU cores (I know, that is not a real thing).

On the other hand: would it be difficult to let the nginx-config (at least) be managed by this workers-configuration or is this all about synapse only and therfore it should be a separate configuration?

This probably got lost somehow in all the work that happened in #456

spantaleev · 2021-02-19T14:46:53Z

We can adjust both of these (nginx workers; Postgres max connections) automatically when Synapse workers are enabled via group_vars/matrix_servers settings.

We can define variables for these things in their respective roles and then override them from group_vars/matrix_servers to some better values (for higher scale). We can even do it per preset (especially if we add larger presets that will require changes). Still, people may wish to adjust the values manually if they're running many additional workers.

Related to #456

Mostly affects people who disable the integrated `matrix-nginx-proxy`. Related to #456 and more specifically 4d62a75.

This is something that got lost during #456 and more specifically 4d62a75. Fixes #914

rltas · 2022-11-11T14:42:01Z

It's probably not too important as well, given that TU Dresden scaled to 10k+ even without it.

Just to get an idea, how many resources/workers are needed at that scale?

eMPee584 · 2022-11-24T22:06:46Z

It's probably not too important as well, given that TU Dresden scaled to 10k+ even without it.

Just to get an idea, how many resources/workers are needed at that scale?

From what I remember, very few.. a couple (3-6) synchrotron workers and the resultant parallelization from using just a single worker for the others already had a sufficient impact compared to a monolithic synapse process.
I'll ask the current admin team for the worker config in use now..

eMPee584 mentioned this pull request Apr 14, 2020

Make systemd-with-workers doc official matrix-org/synapse#7234

Merged

eMPee584 and others added 8 commits April 19, 2020 19:05

Fix corner-cases found through testing (aka ansible is nuts)

cf452fd

Publish synapse worker ports, need to be accessible to nginx

66a4073

nginx config: route traffic to workers on matrix-synapse

e4763c2

FIXME: horrid duplication in template file

Move synapse worker ports up 10k

5f63d28

add missing ; to matrix-synapse.conf.j2

765c046

Nginx conf: more testing less b0rk

46984a4

This was referenced Apr 20, 2020

Support Synapse workers #221

Closed

Performance #366

Open

lovelaced reviewed Jun 18, 2020

View reviewed changes

roles/matrix-synapse/templates/synapse/worker.yaml.j2 Show resolved Hide resolved

lovelaced reviewed Jul 7, 2020

View reviewed changes

lovelaced reviewed Jul 13, 2020

View reviewed changes

vrutkovs mentioned this pull request Aug 1, 2020

Федерацию в отдельный процесс RussianFedora/server-config#8

Closed

nuved reviewed Aug 13, 2020

View reviewed changes

roles/matrix-synapse/templates/synapse/systemd/matrix-synapse-worker@.service.j2 Outdated Show resolved Hide resolved

maxklenk added 4 commits August 27, 2020 15:22

Merge branch 'synapse-workers' into feature/add-worker-support

567d031

only apply worker redirects if workers are enabled

59d1fb7

remove duplicated key

53ccc78

refactor to use new workers and routes they serve

06bc430

maxklenk mentioned this pull request Sep 9, 2020

Feature: add synapse worker support #642

Closed

5 tasks

spantaleev added 4 commits February 15, 2021 11:18

Relocate tasks related to Synapse workers

453a4ec

Fix metrics listeners for Synapse workers

43059bb

`::` leads to errors like: > socket.gaierror: [Errno -9] Address family for hostname not supported

Do not let people enable more than 1 federation_sender worker

61e427d

spantaleev added 4 commits February 16, 2021 17:31

Merge branch 'master' into synapse-workers

daae74b

Document that Synapse pusher worker instances are shardable

d33483b

Related to: - matrix-org/synapse#9407 - matrix-org/synapse#7855

Merge branch 'master' into synapse-workers

1789620

spantaleev added 4 commits February 19, 2021 11:36

Update Synapse worker endpoints

2f732e4

Add Synapse worker presets for easier configuration

9dc87bb

Adding more presets in the future would be nice.

Announce Synapse workers support

b754c27

spantaleev merged commit d94d0e2 into spantaleev:master Feb 19, 2021

spantaleev added a commit that referenced this pull request Feb 19, 2021

Fix matrix-postgres stopping for consistency with other services

1dbdfee

This probably got lost somehow in all the work that happened in #456

This was referenced Feb 21, 2021

ERROR! vars file roles/matrix-synapse/vars/workers.yml was not found #895

Closed

Bridges can get misconfigured with workers #904

Closed

spantaleev added a commit that referenced this pull request Feb 24, 2021

Make healthchecks work for Synapse worker containers

2ef1d9c

Related to #456

spantaleev added a commit that referenced this pull request Mar 1, 2021

Fix point overlap between matrix-domain and Jitsi

a25b813

Mostly affects people who disable the integrated `matrix-nginx-proxy`. Related to #456 and more specifically 4d62a75.

spantaleev added a commit that referenced this pull request Mar 2, 2021

Fix matrix.DOMAIN/_synapse/metrics exposing

009efda

This is something that got lost during #456 and more specifically 4d62a75. Fixes #914

spantaleev mentioned this pull request Sep 15, 2022

Major improvements to Synapse workers #2103

Merged

eMPee584 deleted the synapse-workers branch November 24, 2022 22:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synapse workers #456

Synapse workers #456

eMPee584 commented Apr 14, 2020 •

edited

Loading

eMPee584 commented Apr 19, 2020 •

edited

Loading

eMPee584 commented Apr 20, 2020 •

edited

Loading

spantaleev commented Apr 22, 2020

lovelaced commented Apr 23, 2020

eMPee584 commented Apr 25, 2020 •

edited

Loading

ptman commented Jun 3, 2020

eMPee584 commented Jun 14, 2020

spantaleev commented Jun 14, 2020

lovelaced Jul 7, 2020 •

edited

Loading

eMPee584 Jul 12, 2020

lovelaced Jul 13, 2020

lovelaced Jul 13, 2020 •

edited

Loading

julianfoad Jul 19, 2020

maxklenk commented Aug 27, 2020

spantaleev commented Feb 15, 2021 •

edited

Loading

ptman commented Feb 15, 2021

spantaleev commented Feb 18, 2021

spantaleev commented Feb 19, 2021

ptman commented Feb 19, 2021

jolly-jump commented Feb 19, 2021

spantaleev commented Feb 19, 2021

rltas commented Nov 11, 2022

eMPee584 commented Nov 24, 2022

Synapse workers #456

Synapse workers #456

Conversation

eMPee584 commented Apr 14, 2020 • edited Loading

eMPee584 commented Apr 19, 2020 • edited Loading

eMPee584 commented Apr 20, 2020 • edited Loading

spantaleev commented Apr 22, 2020

lovelaced commented Apr 23, 2020

eMPee584 commented Apr 25, 2020 • edited Loading

ptman commented Jun 3, 2020

eMPee584 commented Jun 14, 2020

spantaleev commented Jun 14, 2020

lovelaced Jul 7, 2020 • edited Loading

Choose a reason for hiding this comment

eMPee584 Jul 12, 2020

Choose a reason for hiding this comment

lovelaced Jul 13, 2020

Choose a reason for hiding this comment

lovelaced Jul 13, 2020 • edited Loading

Choose a reason for hiding this comment

julianfoad Jul 19, 2020

Choose a reason for hiding this comment

maxklenk commented Aug 27, 2020

spantaleev commented Feb 15, 2021 • edited Loading

ptman commented Feb 15, 2021

spantaleev commented Feb 18, 2021

spantaleev commented Feb 19, 2021

ptman commented Feb 19, 2021

jolly-jump commented Feb 19, 2021

spantaleev commented Feb 19, 2021

rltas commented Nov 11, 2022

eMPee584 commented Nov 24, 2022

eMPee584 commented Apr 14, 2020 •

edited

Loading

eMPee584 commented Apr 19, 2020 •

edited

Loading

eMPee584 commented Apr 20, 2020 •

edited

Loading

eMPee584 commented Apr 25, 2020 •

edited

Loading

lovelaced Jul 7, 2020 •

edited

Loading

lovelaced Jul 13, 2020 •

edited

Loading

spantaleev commented Feb 15, 2021 •

edited

Loading