Race condition where a second join races with cache invalidation over replication and erroneously rejects joining Restricted Room #13185

reivilibre · 2022-07-05T16:12:42Z

(This is the cause of a flake in Complement's TestRestrictedRoomsLocalJoin tests when running with workers.)

If room A is a restricted room, restricted to members of room B, then if:

a user first attempts to join room A and is rejected;
the user then joins room B successfully; and
the user then attempts to join room A again

the user can be erroneously rejected from joining room A (in step (3)).

This is because there is a cache, get_rooms_for_user_with_stream_ordering, which is populated in step (1) on the event creator.
In step (2), the event persister will invalidate that cache and send a command over replication for other workers to do the same.
This issue then occurs if the event creator doesn't receive that command until after step (3).

This occurs easily in Complement+workers on CI, where it takes ~200 ms for the invalidation to be received on the new worker (CPU contention in CI is probably playing a big part).

An obvious workaround is for the client to just sleep & retry, but it seems very ugly that we're issuing 403s to clients when they're making fully serial requests.

Another incomplete workaround is for the event creator to invalidate its own cache, but that won't help if the join to room B takes place on a different event creator.

I think a potential solution that would work is to:

when persisting the join event and sending the invalidation, ensure this is done before finishing the replication request and the client request
when starting a new join for a restricted room on an event creator, fetch the latest cache invalidation position from Redis and then only start processing the join once replication has caught up past that point.

(I'm not exactly sure how you 'fetch the latest position'; I was rather hoping there'd be SETMAX in Redis but didn't find it. Alternatively, it's probably possible to just PING Redis and treat the PONG as the barrier, but it's not a very nice solution.)

The text was updated successfully, but these errors were encountered:

reivilibre · 2022-07-07T16:05:13Z

Rough conclusions from the discussion:

We have this problem in quite a few areas of Synapse. SyTest's approach here would be to retry in a loop until the 200 comes back — let's do this for now as doing anything more proper is tricky. (It would also be likely to worsen latency if all requests had to wait for things to catch up.)

In real-world situations this is unlikely to be a problem because this would usually propagate in a few milliseconds.
Theoretically this has a slim chance of causing clients to not know the difference between a 'real'/permanent 403 and a temporary one due to the propagation delay, but we're not aware of this happening in practice and this is already a potential problem with other endpoints.
Generally there are bigger issues to solve.

DMRobertson · 2022-07-07T16:57:15Z

In a nutshell, we've written the test expecting "read-after-write consistency" but we don't have that. (I say this because I think it's useful to have the phrase in mind)

MadLittleMods · 2022-10-06T06:35:44Z

👍 Just experienced the stale cache racing while waiting for invalidation from replication with another Complement test but this time it's all within the same /messages request so it's a little more insidious.

Edit: Split off to #14211

Spawning from matrix-org/synapse#14028 (comment) The reason they failed before was because we're fighting against stale caches across workers racing while waiting for invalidation, see matrix-org/synapse#13185 (comment) > Here is what happens: > > 1. `serverB` has `event1` stored as an `outlier` from previous requests (specifically from MSC3030 jump to date pulling in a missing `prev_event` after backfilling) > 1. Client on `serverB` calls `/messages?dir=b` > 1. `serverB:client_reader1` accepts the request and drives things > 1. `serverB:client_reader1` has some backward extremities in range and requests `/backfill` from `serverA` > 1. `serverB:client_reader1` processes the events from backfill including `event1` and puts them in the `_event_persist_queue` > 1. `serverB:master` picks up the events from the `_event_persist_queue` and persists them to the database, de-outliers `event1` and invalidates its own cache and sends them over replication > 1. `serverB:client_reader1` starts assembling the `/messages` response and gets `event1` out of the stale cache still as an `outlier` > 1. `serverB:client_reader1` responds to the `/messages` request without `event1` because `outliers` are filtered out > 1. `serverB:client_reader1` finally gets the replication data and invalidates its own cache for `event1` (too late, we already got the events from the stale cache and responded)

Spawning from matrix-org/synapse#14028 (comment) CI failure: https://github.com/matrix-org/synapse/actions/runs/3182998161/jobs/5189731097#step:6:15343 (from [discussion](matrix-org/synapse#14028 (comment))) ``` WORKERS=1 POSTGRES=1 COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestJumpToDateEndpoint/parallel/federation/can_paginate_after_getting_remote_event_from_timestamp_to_event_endpoint ``` ### Why did this test fail with workers before? The reason they failed before was because we're fighting against stale caches across workers racing while waiting for invalidation, see matrix-org/synapse#13185 (comment) > Here is what happens: > > 1. `serverB` has `event1` stored as an `outlier` from previous requests (specifically from MSC3030 jump to date pulling in a missing `prev_event` after backfilling) > 1. Client on `serverB` calls `/messages?dir=b` > 1. `serverB:client_reader1` accepts the request and drives things > 1. `serverB:client_reader1` has some backward extremities in range and requests `/backfill` from `serverA` > 1. `serverB:client_reader1` processes the events from backfill including `event1` and puts them in the `_event_persist_queue` > 1. `serverB:master` picks up the events from the `_event_persist_queue` and persists them to the database, de-outliers `event1` and invalidates its own cache and sends them over replication > 1. `serverB:client_reader1` starts assembling the `/messages` response and gets `event1` out of the stale cache still as an `outlier` > 1. `serverB:client_reader1` responds to the `/messages` request without `event1` because `outliers` are filtered out > 1. `serverB:client_reader1` finally gets the replication data and invalidates its own cache for `event1` (too late, we already got the events from the stale cache and responded)

reivilibre added the T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. label Jul 5, 2022

reivilibre mentioned this issue Jul 5, 2022

Complement Postgres & Complement workers: de-flaking #13161

Closed

3 tasks

DMRobertson mentioned this issue Jul 13, 2022

Deflake Join_should_succeed_when_joined_to_allowed_room subtest of restricted rooms tests matrix-org/complement#413

Merged

MadLittleMods added the A-Testing Issues related to testing in complement, synapse, etc label Oct 6, 2022

MadLittleMods mentioned this issue Oct 6, 2022

De-flake MSC3030 backfill test with Synapse worker replication matrix-org/complement#492

Merged

DMRobertson added the Z-Read-After-Write A lack of read-after-write consistency, usually due to cache invalidation races with workers label Oct 17, 2022

This was referenced Oct 17, 2022

Race condition with replication means that publishing room aliases lacks read-after-write consistency between workers #14210

Open

Race condition with replication means /messages backfill lacks read-after-write consistency between workers #14211

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition where a second join races with cache invalidation over replication and erroneously rejects joining Restricted Room #13185

Race condition where a second join races with cache invalidation over replication and erroneously rejects joining Restricted Room #13185

reivilibre commented Jul 5, 2022

reivilibre commented Jul 7, 2022

DMRobertson commented Jul 7, 2022

MadLittleMods commented Oct 6, 2022 •

edited

Loading

Race condition where a second join races with cache invalidation over replication and erroneously rejects joining Restricted Room #13185

Race condition where a second join races with cache invalidation over replication and erroneously rejects joining Restricted Room #13185

Comments

reivilibre commented Jul 5, 2022

reivilibre commented Jul 7, 2022

DMRobertson commented Jul 7, 2022

MadLittleMods commented Oct 6, 2022 • edited Loading

MadLittleMods commented Oct 6, 2022 •

edited

Loading