Skip to content

room version 12 error in unit test tests.storage.databases.main.test_room.RoomBackgroundUpdateStoreTestCase.test_populate_stats_broken_rooms #19780

@jason-famedly

Description

@jason-famedly

In working on bring up the unit tests for room version 12 compatibility, I hit a bit of an extremely niche error situation. This particular test fails with a KeyError on a missing room_id in creation events for v12 rooms. This appears to be because creation events for v12 rooms do not have a room_id in their JSON. This is to be expected due to MSC4291.

The test itself seems fine. The point of it is to "break" a room by removing the value in the room_version column in rooms before running. The "bug" appears to show inside the depths of the event fetching code. It is only hit when the rooms table is missing it's room_version value. EDIT: It also seems to happen when the room is an unknown version

There was an ancient issue about out-of-band invites that led to introducing some simple guessing about what a room version should be based on the format version of an event. This conditional code is what is being hit with this test. But when trying to the raise a InvalidEventError, the room_id is missing so a KeyError is raised instead.

Traceback of exception
Traceback (most recent call last):
  File "/home/jason/dev/famedly/synapse-upstreaming/tests/storage/databases/main/test_room.py", line 187, in test_populate_stats_broken_rooms
    self.run_background_updates("populate_stats_process_rooms")
  File "/home/jason/dev/famedly/synapse-upstreaming/tests/storage/databases/main/test_room.py", line 68, in run_background_updates
    self.wait_for_background_updates()
  File "/home/jason/dev/famedly/synapse-upstreaming/tests/unittest.py", line 495, in wait_for_background_updates
    self.get_success(
  File "/home/jason/dev/famedly/synapse-upstreaming/tests/unittest.py", line 742, in get_success
    return self.successResultOf(deferred)
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-vUbQZmJ1-py3.13/lib/python3.13/site-packages/twisted/trial/_synctest.py", line 732, in successResultOf
    self.fail(
twisted.trial.unittest.FailTest: Success result expected on <Deferred at 0x7a8ddf333690 current result: None>, found failure result instead:
Traceback (most recent call last):
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-vUbQZmJ1-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 1853, in _inlineCallbacks
    result = context.run(
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-vUbQZmJ1-py3.13/lib/python3.13/site-packages/twisted/python/failure.py", line 467, in throwExceptionIntoGenerator
    return g.throw(self.value.with_traceback(self.tb))
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/background_updates.py", line 577, in do_next_background_update
    await self._do_background_update(desired_duration_ms)
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/background_updates.py", line 620, in _do_background_update
    items_updated = await update_handler(progress, batch_size)
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/databases/main/stats.py", line 221, in _populate_stats_process_rooms
    await self._calculate_and_set_initial_state_for_room(room_id)
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/databases/main/stats.py", line 577, in _calculate_and_set_initial_state_for_room
    state_event_map = await self.get_events(event_ids, get_prev_content=False)  # type: ignore[attr-defined]
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/logging/opentracing.py", line 949, in _wrapper
    return await func(*args, **kwargs)
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/databases/main/events_worker.py", line 624, in get_events
    events = await self.get_events_as_list(
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/logging/opentracing.py", line 949, in _wrapper
    return await func(*args, **kwargs)
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/logging/opentracing.py", line 949, in _wrapper
    return await func(*args, **kwargs)
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/databases/main/events_worker.py", line 680, in get_events_as_list
    event_entry_map = await self.get_unredacted_events_from_cache_or_db(
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/logging/opentracing.py", line 949, in _wrapper
    return await func(*args, **kwargs)
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/databases/main/events_worker.py", line 929, in get_unredacted_events_from_cache_or_db
    missing_events: dict[str, EventCacheEntry] = await delay_cancellation(
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-vUbQZmJ1-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 1187, in __iter__
    yield self
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-vUbQZmJ1-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 1857, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/databases/main/events_worker.py", line 915, in get_missing_events_from_cache_or_db
    raise e
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/databases/main/events_worker.py", line 908, in get_missing_events_from_cache_or_db
    db_missing_events = await self._get_events_from_db(
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/logging/opentracing.py", line 949, in _wrapper
    return await func(*args, **kwargs)
  File "/home/jason/dev/famedly/synapse-upstreaming/synapse/storage/databases/main/events_worker.py", line 1527, in _get_events_from_db
    d["room_id"], event_id, original_ev.event_id
builtins.KeyError: 'room_id'

The chances of having this particular situation in a room version 12 room seem extraordinarily low, if not impossible so I would not say this is a high priority issue. That said I see multiple ways of remedying this test "failure"

  • change the test to break in a different way
  • adjust the event fetching code to accommodate events with no room_id in their raw JSON
    • Begin catching a KeyError in addition to the InvalidEventError here.
    • Change the condition here to include filtering out creation events. This has a side effect of needing to bring the section below up-to-date to handle newer room versions or another error will be hit when converting the raw JSON into its FrozenEvent*. This then produces a different event_id to what is expected and tries to raise a DatabaseCorruptionError which fails with another KeyError instead because room_id is still missing from the raw JSON.
    • Change EventRow(and it's associated SQL) to retrieve the room_id as a separate data point, there by avoiding the KeyError issue altogether.

I vote for the last one

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions