Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Race condition where a second join races with cache invalidation over replication and erroneously rejects joining Restricted Room #13185

Open
reivilibre opened this issue Jul 5, 2022 · 3 comments
Labels
A-Testing Issues related to testing in complement, synapse, etc T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers

Comments

@reivilibre
Copy link
Contributor

(This is the cause of a flake in Complement's TestRestrictedRoomsLocalJoin tests when running with workers.)

If room A is a restricted room, restricted to members of room B, then if:

  1. a user first attempts to join room A and is rejected;
  2. the user then joins room B successfully; and
  3. the user then attempts to join room A again

the user can be erroneously rejected from joining room A (in step (3)).

This is because there is a cache, get_rooms_for_user_with_stream_ordering, which is populated in step (1) on the event creator.
In step (2), the event persister will invalidate that cache and send a command over replication for other workers to do the same.
This issue then occurs if the event creator doesn't receive that command until after step (3).

2022-07-05-TestRestrictedRoomsLocalJoin_race

This occurs easily in Complement+workers on CI, where it takes ~200 ms for the invalidation to be received on the new worker (CPU contention in CI is probably playing a big part).

An obvious workaround is for the client to just sleep & retry, but it seems very ugly that we're issuing 403s to clients when they're making fully serial requests.

Another incomplete workaround is for the event creator to invalidate its own cache, but that won't help if the join to room B takes place on a different event creator.

I think a potential solution that would work is to:

  1. when persisting the join event and sending the invalidation, ensure this is done before finishing the replication request and the client request
  2. when starting a new join for a restricted room on an event creator, fetch the latest cache invalidation position from Redis and then only start processing the join once replication has caught up past that point.

(I'm not exactly sure how you 'fetch the latest position'; I was rather hoping there'd be SETMAX in Redis but didn't find it. Alternatively, it's probably possible to just PING Redis and treat the PONG as the barrier, but it's not a very nice solution.)

@reivilibre reivilibre added the T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. label Jul 5, 2022
@reivilibre
Copy link
Contributor Author

Rough conclusions from the discussion:

We have this problem in quite a few areas of Synapse. SyTest's approach here would be to retry in a loop until the 200 comes back — let's do this for now as doing anything more proper is tricky. (It would also be likely to worsen latency if all requests had to wait for things to catch up.)

In real-world situations this is unlikely to be a problem because this would usually propagate in a few milliseconds.
Theoretically this has a slim chance of causing clients to not know the difference between a 'real'/permanent 403 and a temporary one due to the propagation delay, but we're not aware of this happening in practice and this is already a potential problem with other endpoints.
Generally there are bigger issues to solve.

@DMRobertson
Copy link
Contributor

In a nutshell, we've written the test expecting "read-after-write consistency" but we don't have that. (I say this because I think it's useful to have the phrase in mind)

@MadLittleMods
Copy link
Contributor

MadLittleMods commented Oct 6, 2022

👍 Just experienced the stale cache racing while waiting for invalidation from replication with another Complement test but this time it's all within the same /messages request so it's a little more insidious.

Edit: Split off to #14211

@MadLittleMods MadLittleMods added the A-Testing Issues related to testing in complement, synapse, etc label Oct 6, 2022
MadLittleMods added a commit to matrix-org/complement that referenced this issue Oct 6, 2022
Spawning from matrix-org/synapse#14028 (comment)

The reason they failed before was because we're fighting against
stale caches across workers racing while waiting for invalidation,
see matrix-org/synapse#13185 (comment)

> Here is what happens:
>
>  1. `serverB` has `event1` stored as an `outlier` from previous requests (specifically from MSC3030 jump to date pulling in a missing `prev_event` after backfilling)
>  1. Client on `serverB` calls `/messages?dir=b`
>  1. `serverB:client_reader1` accepts the request and drives things
>  1. `serverB:client_reader1` has some backward extremities in range and requests `/backfill` from `serverA`
>  1. `serverB:client_reader1` processes the events from backfill including `event1` and puts them in the `_event_persist_queue`
>  1. `serverB:master` picks up the events from the `_event_persist_queue` and persists them to the database, de-outliers `event1` and invalidates its own cache and sends them over replication
>  1. `serverB:client_reader1` starts assembling the `/messages` response and gets `event1` out of the stale cache still as an `outlier`
>  1. `serverB:client_reader1` responds to the `/messages` request without `event1` because `outliers` are filtered out
>  1. `serverB:client_reader1` finally gets the replication data and invalidates its own cache for `event1` (too late, we already got the events from the stale cache and responded)
MadLittleMods added a commit to matrix-org/complement that referenced this issue Oct 13, 2022
Spawning from matrix-org/synapse#14028 (comment)

CI failure: https://github.com/matrix-org/synapse/actions/runs/3182998161/jobs/5189731097#step:6:15343 (from [discussion](matrix-org/synapse#14028 (comment)))
```
WORKERS=1 POSTGRES=1 COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestJumpToDateEndpoint/parallel/federation/can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint
```

### Why did this test fail with workers before?

The reason they failed before was because we're fighting against stale caches across workers racing while waiting for invalidation, see matrix-org/synapse#13185 (comment)

> Here is what happens:
>
>  1. `serverB` has `event1` stored as an `outlier` from previous requests (specifically from MSC3030 jump to date pulling in a missing `prev_event` after backfilling)
>  1. Client on `serverB` calls `/messages?dir=b`
>  1. `serverB:client_reader1` accepts the request and drives things
>  1. `serverB:client_reader1` has some backward extremities in range and requests `/backfill` from `serverA`
>  1. `serverB:client_reader1` processes the events from backfill including `event1` and puts them in the `_event_persist_queue`
>  1. `serverB:master` picks up the events from the `_event_persist_queue` and persists them to the database, de-outliers `event1` and invalidates its own cache and sends them over replication
>  1. `serverB:client_reader1` starts assembling the `/messages` response and gets `event1` out of the stale cache still as an `outlier`
>  1. `serverB:client_reader1` responds to the `/messages` request without `event1` because `outliers` are filtered out
>  1. `serverB:client_reader1` finally gets the replication data and invalidates its own cache for `event1` (too late, we already got the events from the stale cache and responded)
@DMRobertson DMRobertson added the Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers label Oct 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Testing Issues related to testing in complement, synapse, etc T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers
Projects
None yet
Development

No branches or pull requests

3 participants