Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix backfilled events being rejected for no state_groups #10439

Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/10439.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix events with floating outlier state being rejected over federation.
6 changes: 4 additions & 2 deletions synapse/handlers/federation.py
Original file line number Diff line number Diff line change
Expand Up @@ -2711,9 +2711,11 @@ async def _update_auth_events_and_context_for_auth(
event.event_id,
e.event_id,
)
context = await self.state_handler.compute_event_context(e)
missing_auth_event_context = (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any linting we can add to avoid re-assigning or mask existing variables?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikjohnston Still curious if there is some tooling we can use to avoid this problem?

Since Python is not block scoped, I think we do this all over on purpose but would be nice to be more conscious about it.

Copy link
Contributor Author

@MadLittleMods MadLittleMods Jul 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some cursory googling around Python linting (black, flake8, mypy, type hinting, etc), I'm not finding anything around constants, declarations vs assignments, and re-assignments 😥

Even re-assignments of function parameters like you would see with ESlint no-param-reassign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like there is a handful of related violations in the wemake-python-styleguide best practices section. But none of these will specifically stop function parameter re-assignment for this exact bug scenario (tried it in a code playground).

Dev notes

Interesting comparison of linting type tools: https://github.com/wemake-services/wemake-python-styleguide#what-we-are-about


Related but not related links (just linking for my own reference):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created wemake-services/wemake-python-styleguide#2128 to propose a new linting rule for this scenario 🐣

Copy link
Contributor Author

@MadLittleMods MadLittleMods Oct 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use Semgrep to find all of the scenarios in the codebase where we do this.

await self.state_handler.compute_event_context(e)
)
await self._auth_and_persist_event(
origin, e, context, auth_events=auth
origin, e, missing_auth_event_context, auth_events=auth
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Main fix is here ^

First stumbled upon in #10245 (comment)

Reproducible on a federated homeserver when there is a membership auth event as a floating outlier. Then when we try to backfill one of that persons messages, it has missing membership auth to fetch which caused us to mistakenly replace the context for the message with that of the floating membership outlier event. Since outliers have no state or state_group, the error bubbles up when we continue down the persisting route: sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group

Call stack:

backfill
_auth_and_persist_event
_check_event_auth
# Modifies the context (where the bug is)
_update_auth_events_and_context_for_auth

We touch some of the same code on the the receive PDU route but if it is able to reach this line, it is able to regenerate the context and recover. The cases where we return before and it can reproduce is when it's another outlier, no different_auth, or auth_event is from another room.

on_receive_pdu
_process_received_pdu
_auth_and_persist_event
_check_event_auth
# Modifies the context (where the bug is)
_update_auth_events_and_context_for_auth

Copy link
Contributor Author

@MadLittleMods MadLittleMods Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main fix disappeared from the diff here because it was merged alongside #10245. Full diff before merges is view-able via https://github.com/matrix-org/synapse/pull/10439/files/eb32018cb3d9e048fa4b3523a6fb29b1e2b36faf

This PR is still relevant to add the regression test for it though.

)

if e.event_id in event_auth_events:
Expand Down
132 changes: 132 additions & 0 deletions tests/handlers/test_federation.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import List
from unittest import TestCase

from synapse.api.constants import EventTypes
Expand All @@ -22,6 +23,7 @@
from synapse.logging.context import LoggingContext, run_in_background
from synapse.rest import admin
from synapse.rest.client.v1 import login, room
from synapse.util.stringutils import random_string

from tests import unittest

Expand All @@ -39,6 +41,8 @@ def make_homeserver(self, reactor, clock):
hs = self.setup_test_homeserver(federation_http_client=None)
self.handler = hs.get_federation_handler()
self.store = hs.get_datastore()
self.state_store = hs.get_storage().state
self._event_auth_handler = hs.get_event_auth_handler()
return hs

def test_exchange_revoked_invite(self):
Expand Down Expand Up @@ -190,6 +194,134 @@ def test_rejected_state_event_state(self):

self.assertEqual(sg, sg2)

def test_backfill_floating_outlier_membership_auth(self):
"""
As the remote federated homeserver, check that we can properly process
an event with auth_events that include a floating membership event from
the OTHER_SERVER.

Regression test, see #10439.
"""

def filter_auth_events_on_event(event: EventBase, auth_events: List[EventBase]):
# Create a StateMap[str]
auth_event_state_map = {
(e.type, e.state_key): e.event_id for e in auth_events
}
# Actually strip down and use the necessary auth events
auth_event_ids = self._event_auth_handler.compute_auth_events(
event=event,
current_state_ids=auth_event_state_map,
for_verification=False,
)

# Replace the auth_events with the stripped down ones
event.auth_events = auth_event_ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't be able to edit the auth events directly (as it invalidates all the hashes), so I am loathed to rely on this.

Instead, can we use an event builder to create the event? Something like:

builder = hs.get_event_builder_factory().for_room_version(room_version, event_dict)
event = await builder.build(
    prev_event_ids=[fake_prev_event_id],
    auth_events=compute_auth_events(builder, state),
    depth=depth,
)

Copy link
Contributor Author

@MadLittleMods MadLittleMods Jul 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've switched to the event builder but I still have to edit the signatures to make them appear from the OTHER_SERVER. Otherwise they are rejected as not being signed by that server.

This seems reasonable as I think the event hashes are the same because we pop off the signatures when we compute them.


OTHER_SERVER = "otherserver"
OTHER_USER = "@otheruser:" + OTHER_SERVER

# create the room
user_id = self.register_user("kermit", "test")
tok = self.login("kermit", "test")
room_id = self.helper.create_room_as(
room_creator=user_id,
is_public=True,
tok=tok,
extra_content={
"preset": "public_chat",
},
)
room_version = self.get_success(self.store.get_room_version(room_id))

prev_event_ids = self.get_success(self.store.get_prev_events_for_room(room_id))
(
most_recent_prev_event_id,
most_recent_prev_event_depth,
) = self.get_success(self.store.get_max_depth_of(prev_event_ids))
# mapping from (type, state_key) -> state_event_id
prev_state_map = self.get_success(
self.state_store.get_state_ids_for_event(most_recent_prev_event_id)
)
# List of state event ID's
prev_state_ids = list(prev_state_map.values())
auth_event_ids = prev_state_ids
auth_events = list(
self.get_success(self.store.get_events(auth_event_ids)).values()
)
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

# build a floating outlier member state event
fake_prev_event_id = "$" + random_string(43)
member_event = event_from_pdu_json(
{
"type": EventTypes.Member,
"content": {
"membership": "join",
},
"state_key": OTHER_USER,
"room_id": room_id,
"sender": OTHER_USER,
"depth": most_recent_prev_event_depth,
"prev_events": [fake_prev_event_id],
"auth_events": auth_event_ids.copy(),
"origin_server_ts": self.clock.time_msec(),
"signatures": {
OTHER_SERVER: {"ed25519:key_version": "SomeSignatureHere"}
},
},
room_version,
outlier=True,
)
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved
filter_auth_events_on_event(member_event, auth_events)

# build and send an event authed based on the member event
raw_auth_events = auth_events + [member_event]
message_event = event_from_pdu_json(
{
"type": EventTypes.Message,
"content": {},
"room_id": room_id,
"sender": OTHER_USER,
"depth": most_recent_prev_event_depth,
"prev_events": prev_event_ids.copy(),
"auth_events": [ev.event_id for ev in raw_auth_events],
"origin_server_ts": self.clock.time_msec(),
"signatures": {
OTHER_SERVER: {"ed25519:key_version": "SomeSignatureHere"}
},
},
room_version,
)
filter_auth_events_on_event(message_event, raw_auth_events.copy())

# Stub the /event_auth response from the OTHER_SERVER
async def get_event_auth(destination, room_id, event_id):
return [
auth_event
for auth_event in raw_auth_events
if auth_event.type is EventTypes.Create
or auth_event.type is EventTypes.PowerLevels
or auth_event.type is EventTypes.JoinRules
or (
auth_event.type is EventTypes.Member
and auth_event.state_key == message_event.sender
)
]
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

self.handler.federation_client.get_event_auth = get_event_auth

with LoggingContext("receive_pdu"):
d = run_in_background(
self.handler.on_receive_pdu, OTHER_SERVER, message_event
)
self.get_success(d)

# Try and get the events
stored_event = self.get_success(
self.store.get_event(message_event.event_id, allow_none=True)
)
self.assertTrue(stored_event is not None)

@unittest.override_config(
{"rc_invites": {"per_user": {"per_second": 0.5, "burst_count": 3}}}
)
Expand Down