Port Event class to Rust#19701
Conversation
41d9ebb to
1c4406b
Compare
001cddc to
1a6900f
Compare
48b0729 to
f1705f2
Compare
68b0ec2 to
7c5c0b7
Compare
db0b3f3 to
2b8272e
Compare
Small prerequisites for porting the Python EventBase hierarchy to Rust: - duration: make `from_milliseconds` const and add an `IntoPyObject` impl for owned `SynapseDuration`, so the new Rust `Event.sticky_duration()` can return one directly to Python. - internal_metadata: rename `copy()` to `deep_copy()` (matching the new naming used by the rest of the events module) and make `new()` callable from sibling modules. - json_object: expose `object` as a `pub` field and add a `get_field` helper so the new Event class can read from it without going through Python. - signatures, unsigned: add `deep_copy()` methods so the new Event class can implement its own deep-copy.
Adds a single `Event` Rust pyclass that replaces the Python EventBase /
FrozenEventV{1,2,3,4,VMSC4242} hierarchy. The class is added but not yet
wired into Python — callers continue to use the existing Python classes
in this commit; the migration follows in the next commit.
The internals use an `FormattedEvent` over
`EventFormatV{1,2V3,4,VMSC4242}` structs sharing an `EventCommonFields`.
Format-specific behaviour (prev_event_ids, auth_event_ids, room_id
derivation for v12 create events, etc) is encapsulated per variant.
Event IDs are computed in the constructor for v3+ formats; v1/v2 use the
`event_id` field as-is.
Two supporting Rust modules are added at the same time:
- `events::constants` — string constants for event types, top-level
fields, and per-event-type content fields, used to keep the redaction
rules and field accessors readable.
- `events::utils` — `redact()`, `compute_event_reference_hash()`, and
`calculate_event_id()`, ported from `synapse.crypto.event_signing` /
`synapse.events.utils`.
Replace the abstract `synapse.events.EventBase` and the concrete `FrozenEvent`, `FrozenEventV2`, `FrozenEventV3`, `FrozenEventV4`, and `FrozenEventVMSC4242` Python classes with a single Rust-backed `Event`, exposed via `synapse.synapse_rust.events.Event`. `EventBase` becomes a `TypeAlias` for `Event` so that the existing type annotations across the codebase keep working. Notable behavioural notes: - `make_event_from_dict()` now constructs the Rust class. Event IDs for v3+ formats are computed in the constructor (instead of lazily on first access). - `clone_event()` is now a single `event.deep_copy()` call. The old shallow copy of `unsigned` was effectively a deep copy in practice; `deep_copy()` matches that. - The third-party event-rules callback no longer needs to call `event.freeze()` — Events are immutable from Python by construction. - A small `assert_never` is added in `events_worker.py` to make the `redact_behaviour` switch exhaustive now that the type checker can see all branches. All test fixtures that constructed `FrozenEventV3` etc. directly are updated to construct `Event` instead.
Adapt tests that mutated Python event internals (`_event_id`, `_dict`, direct attribute assignment, `FrozenEventV3(...)` construction) to work with the new Rust-backed `Event` class: - Rebuild events via `make_event_from_dict` / `make_test_event` instead of patching attributes in place. - Plumb `rejected_reason` through `_join_rules_event` rather than assigning to `rejected_reason` after construction. - Replace the hand-built event in `test_msc4242_state_dag` with a `Mock(spec=EventBase)` since the test only needs a handful of attributes. - Add `# type: ignore` for the deprecated `event.user_id` / `event[key]` accessors and for assigning to `event.content`. - In `make_test_event`, drop the default `room_id` for v11+ create events so each gets a distinct hash-derived room ID.
We have to take a slightly different approach here as we can't subclass the native Event type.
2b8272e to
a5081d1
Compare
Now that we do a bit more validadtion of events, it's possible that an event persisted in the database may now not pass validation. This shouldn't happen, but let's handle it correctly by logging and returning that we couldn't find the event. This is the same as what we do if we can't parse the JSON.
| /// The event ID. For format v1 this is read directly from the JSON; | ||
| /// for v2+ it is computed from the canonical-JSON hash at | ||
| /// construction time and cached here. | ||
| event_id: Arc<str>, |
There was a problem hiding this comment.
What benefit are we getting from this being a Arc<str> vs an owned String?
Also a general question of the Arc/Box usage throughout these new types
There was a problem hiding this comment.
Ah yeah, so Box<str> vs String is mostly about space, since String has to store a capacity it takes up more memory. It also has mild advantage that is really indicates its immutable.
In this particular case, we also use Arc<str> so that we can point to the same event_id which is defined in EventFormatV1. (Will add a comment to that effect).
| //! On-the-wire representations of Matrix events, parameterised by event format | ||
| //! version. |
There was a problem hiding this comment.
We could explain this a little more plainly. Like "event JSON" and "parsed".
Reading far enough, I see a "Serialization and deserialization" but I think we could explain a bit here.
| //! don't need to be parsed up front. Generally, optional fields should be | ||
| //! handled via `other_fields`, as this saves space when they are not present. |
There was a problem hiding this comment.
Generally, optional fields should be handled via
other_fields, as this saves space when they are not present.
These details should be on described next to other_fields
| with self.assertRaises(NotImplementedError): | ||
| isinstance(object(), EventProtocol) |
There was a problem hiding this comment.
We could still test this raises NotImplementedError right?
| ev: EventBase, state: StateMap[EventBase] | ||
| ) -> tuple[bool, JsonDict | None]: | ||
| ev.content = {"x": "y"} | ||
| ev.content = {"x": "y"} # type: ignore[misc] |
There was a problem hiding this comment.
Explain type ignore (comment)
| room_version=RoomVersions.MSC4242v12, | ||
| ) | ||
| ev._event_id = id # type: ignore[attr-defined] | ||
| assert supports_msc4242_state_dag(ev) |
There was a problem hiding this comment.
Why no longer using real events?
| def __iter__(self) -> Iterator[str]: ... | ||
| def __eq__(self, other: object) -> bool: ... | ||
|
|
||
| class Event: |
There was a problem hiding this comment.
Need to manually transfer docstrings
| } | ||
|
|
||
| #[getter(state_key)] | ||
| fn state_key_attr(&self) -> PyResult<&str> { |
There was a problem hiding this comment.
Why state_key_attr naming?
a87f1e0 to
7115164
Compare
| /// flat object matching the Matrix spec. | ||
| #[derive(Serialize, Deserialize)] | ||
| pub struct FormattedEvent<E = Arc<EventFormatEnum>> { | ||
| /// The event's signatures. This is a mutable field. |
There was a problem hiding this comment.
| /// The event's signatures. This is a mutable field. | |
| /// The event's signatures. Kept separate from common/specific fields as this is a mutable field. |
(also applies for unsigned)
| /// The `signatures` and `unsigned` fields are kept as dedicated typed | ||
| /// wrappers because they round-trip between Rust and Python repeatedly | ||
| /// and benefit from caching. `common_fields` and `specific_fields` are |
There was a problem hiding this comment.
I'm not following exactly. I would think common_fields/specific_fields are the ones that are shared/cached because they don't change.
How are signatures/unsigned being cached?
This also goes against what's written above AFAICT:
//! The `signatures` and `unsigned` fields are kept distinct from the
//! common/specific as they allow mutation. When copying an event they need to
//! be deep-copied, but the common/specific fields (which are immutable) can be
//! shared.
| specific_fields: Arc::clone(&self.specific_fields), | ||
| common_fields: Arc::clone(&self.common_fields), |
There was a problem hiding this comment.
👌 Perfect. I feel like we should drop a note about the writable aspect protecting us here.
|
|
||
| def clone_event(event: EventBase) -> EventBase: | ||
| """Take a copy of the event. | ||
| """Take a copy of the event.""" |
There was a problem hiding this comment.
With the newly learned details from #19701 (comment), we're luckily protected out of the box 👍
Perhaps more along these lines of a hint:
| """Take a copy of the event.""" | |
| """ | |
| Take a copy of the event. | |
| Only `unsigned`/`signatures` is mutable. Mutating other properties will result in `AttributeError` (not writable). | |
| """ |
| if event.fields.common_fields.type_state_key_tuple() != Some((M_ROOM_CREATE, "")) | ||
| && auth_event_ids.is_empty() | ||
| { | ||
| return Err(PyRuntimeError::new_err( |
There was a problem hiding this comment.
The "assert" language signals an invariant check. It better describes "this should never happen if code is correct" vs. "something went wrong during execution"
| return Err(PyRuntimeError::new_err( | ||
| "auth_event_ids is unexpectedly empty for a non-create event", | ||
| )); |
There was a problem hiding this comment.
More:
| return Err(PyRuntimeError::new_err( | |
| "auth_event_ids is unexpectedly empty for a non-create event", | |
| )); | |
| "auth_event_ids has not been calculated for event_id={}. All events (aside from the `m.room.create` event should have some `auth_event_ids` set.", |
Even more:
| return Err(PyRuntimeError::new_err( | |
| "auth_event_ids is unexpectedly empty for a non-create event", | |
| )); | |
| "Expected auth_event_ids for event_id={}. All events (aside from the `m.room.create` event) should have some `auth_event_ids` set. Either the `auth_event_ids` have not been calculated yet or this is a Synapse programming error.", |
Ports the event class to Rust.
The main difference here are:
Reviewable commit-by-commit.
Overview of Event Rust structure
The format of the event struct in Rust is quite different than that in Python.
The top-level looks like:
which includes the actual parsed event in
FormattedEvent, plus the rest of the event metadata.The struct is further split into the common fields, format specific fields, plus the signatures and unsigned. We split out the signature and unsigned fields as they are mutable, so when we clone the event we can still share the common and specific fields and only copy signature and unsigned.
The
specific_fieldsare the fields that depend on the format version. They can either be a specific format (e.g.E = EventFormatV1) or a type-erased enumEventFormatEnumthat is across all room versions:For example:
Dev notes
As discussed in
#element-backend-internal:matrix.org