Description
After upgrading to 1.152.0, the event_worker crashes repeatedly with an unhandled KeyError: 'quarantined_media' in on_POSITION. This tears down the Redis
replication connection every ~3 seconds, causing the sync worker to stop receiving events and bridges to fail delivering E2EE decryption keys.
Steps to reproduce
Root Cause
on_POSITION in synapse/replication/tcp/handler.py line 635 does a direct dict lookup:
stream = self._streams[cmd.stream_name]
If a worker receives a POSITION for a stream it doesn't own (e.g. quarantined_media on the event worker), this raises an unhandled KeyError which tears down the
Twisted connection.
Note: adding quarantined_media: ["media_worker"] to stream_writers in homeserver.yaml does not help — WriterLocations.__init__() rejects it as an unexpected keyword
argument.
### Homeserver
homeserver
### Synapse Version
1.152.0
### Installation Method
Docker (matrixdotorg/synapse)
### Database
PostgreSQL 18
### Workers
Multiple workers
### Platform
OS │ Ubuntu 24.04 (Oracle Cloud)
Kernel │ 6.17.0
Arch │ aarch64 (ARM64)
Docker │ 27.4.1
Python │ 3.13.13
Synapse │ 1.152.0 (matrixdotorg/synapse:latest)
Datenbank │ PostgreSQL 18
Deployment │ Docker, Worker-Setup mit Redis-Replication
### Configuration
homeserver.yaml (relevanter Ausschnitt):
stream_writers:
events: ["event_worker"]
receipts: ["event_worker"]
typing: ["event_worker"]
presence: ["event_worker"]
to_device: ["event_worker"]
account_data: ["event_worker"]
worker_event.yaml:
worker_app: synapse.app.generic_worker
worker_name: event_worker
worker_listeners:
- port: 8083
bind_addresses: ['127.0.0.1']
type: http
resources:
- names: [client, federation, replication]
compress: true
worker_media.yaml:
worker_app: synapse.app.generic_worker
worker_name: media_worker
worker_listeners:
- type: http
port: 8085
resources:
- names: [media, replication]
### Relevant log output
```shell
CRITICAL - sentinel - Unhandled Error
Traceback (most recent call last):
File ".../twisted/internet/posixbase.py", line 491, in _doReadOrWrite
File ".../twisted/internet/tcp.py", line 250, in doRead
File ".../txredisapi.py", line 1858, in dataReceived
File ".../synapse/replication/tcp/redis.py", line 178, in messageReceived
File ".../synapse/replication/tcp/redis.py", line 219, in handle_command
File ".../synapse/replication/tcp/handler.py", line 635, in on_POSITION
builtins.KeyError: 'quarantined_media'
Anything else that would be useful to know?
Fix
stream = self._streams.get(cmd.stream_name)
if stream is None:
logger.debug("Ignoring POSITION for unknown stream %s", cmd.stream_name)
return
The fix/workaround needs to be applied to all workers, not just the event worker.
Every worker subscribes to Redis pub/sub and receives all POSITION broadcasts,
including for streams it doesn't own.
Affected workers in our setup: event_worker, sync_worker, media_worker,
federation_worker, push_worker.
Description
After upgrading to 1.152.0, the
event_workercrashes repeatedly with an unhandledKeyError: 'quarantined_media'inon_POSITION. This tears down the Redisreplication connection every ~3 seconds, causing the sync worker to stop receiving events and bridges to fail delivering E2EE decryption keys.
Steps to reproduce
Root Cause
on_POSITIONinsynapse/replication/tcp/handler.pyline 635 does a direct dict lookup:Anything else that would be useful to know?
Fix
stream = self._streams.get(cmd.stream_name)
if stream is None:
logger.debug("Ignoring POSITION for unknown stream %s", cmd.stream_name)
return
The fix/workaround needs to be applied to all workers, not just the event worker.
Every worker subscribes to Redis pub/sub and receives all POSITION broadcasts,
including for streams it doesn't own.
Affected workers in our setup: event_worker, sync_worker, media_worker,
federation_worker, push_worker.