TL;DR
Our three-tier agent onboarding is Paste A → Paste B → Paste C, with a visible Element effect at each tier. Pastes A and B are shipped and verified end-to-end. Paste C (SAS / interactive verification so Element shows the bot as "verified by you") is not shipping because the Python SDK we standardized on — matrix-nio — does not have reliable high-level SAS support, and hand-rolling the state machine on top of its primitives did not produce a stable flow.
Plan: switch the responder to mautrix-python for the verification layer. mautrix-python is the SDK that powers every Element Bridge (Telegram, iMessage, Signal, WhatsApp, etc.), drives real SAS flows in production, and is already what hermes-agent's gateway uses internally. This is not a fork, not a rewrite of our stack — it's using the right tool for this layer.
This issue documents what we tried, why it didn't work, and the recommended next steps.
Context
The three-tier onboarding in the signup page (/signup) is designed so an agent — or a new human — progresses through visible Element states:
| Paste |
End state in Element |
Status |
| A |
Bot alive, E2EE DM decrypts |
✅ shipped |
| A |
Element shows yellow "encrypted by a device not verified by its owner" |
(expected intermediate) |
| B |
MSK/SSK/USK published; warning drops to "not verified by you" |
✅ shipped (POST /signup/api/crosssign) |
| C |
User clicks Verify → SAS emoji flow → green shield |
🚧 blocked on SDK |
Paste A uses matrix-nio[e2e] (libolm-backed crypto) and works on weak open models (verified with gpt-oss-120b via OpenRouter): end-to-end onboarding, encrypted DM, server-assigned event_id returned as un-fakeable proof of success. Paste B is a server-side helper (knock-approver/approver.py): it generates three Ed25519 keypairs for the new user, signs SSK and USK with MSK, signs the user's device with SSK, uploads via /_matrix/client/v3/keys/device_signing/upload + /_matrix/client/v3/keys/signatures/upload. Verified with /keys/query showing master_keys, self_signing_keys, user_signing_keys, and the device signed by SSK.
Paste C would add a SAS verification handler to the responder so that when the user clicks "Verify" on the bot's profile in Element, the Matrix verification state machine runs through m.key.verification.{request,ready,start,accept,key,mac,done} without any human-on-the-bot-side intervention.
What we tried (matrix-nio)
Reference callback shape, modeled on matrix-nio docs and examples:
from nio import (
KeyVerificationStart, KeyVerificationKey,
KeyVerificationMac, KeyVerificationCancel,
)
async def on_verification(event):
tx = event.transaction_id
if isinstance(event, KeyVerificationStart):
if "m.sas.v1" in event.short_authentication_string:
await client.accept_key_verification(tx)
elif isinstance(event, KeyVerificationKey):
await client.confirm_short_auth_string(tx)
elif isinstance(event, KeyVerificationMac):
sas = client.key_verifications.get(tx)
if sas:
try: await client.to_device(sas.get_mac())
except Exception: pass
client.add_to_device_callback(on_verification, (
KeyVerificationStart, KeyVerificationKey,
KeyVerificationMac, KeyVerificationCancel,
))
To test, we wrote a driver (tests/test_sas_verify.py, not committed) that:
- signs up a fresh bot via
/signup/api
- runs
/signup/api/crosssign (Paste B) so the bot has MSK/SSK/USK
- launches the Paste A+C responder
- as a separate matrix-nio client (standing in for the user's Element), fetches the bot's device via
/_matrix/client/v3/keys/query, injects it into the verifier's device_store, and calls start_key_verification(device) to send the initial m.key.verification.start
What happened
Bot's log consistently showed exactly one event:
[bot verif] KeyVerificationStart tx=...
Then silence. The verifier never received m.key.verification.key from the bot, so confirm_short_auth_string never fired, MACs never exchanged, 30-second timeout, done.
Things we ruled out:
- ❌ bot crashed (process stayed up, sync loop healthy,
!ping still returned pong)
- ❌
device_signed: false (re-crosssigned after bot uploaded device keys; device_signed: true confirmed via /keys/query)
- ❌ nio-specific event-routing issue ("cast a wide net" variant registering on
ToDeviceEvent base class behaved identically)
Root cause (as best we can tell)
matrix-nio's Sas class (in nio/crypto/sas.py) is a partial implementation: it exposes the primitives (share_key, get_mac, confirm_short_auth_string) but does not orchestrate the full state machine for you. The caller is expected to:
- know when to call
share_key vs. wait for nio's sync loop to have implicitly done so
- emit the
m.key.verification.accept event as a side effect of accept_key_verification, but ALSO sometimes emit a m.key.verification.key via share_key() — redundantly? not always? — depending on whether we were initiator or responder
- handle the MAC at the right moment (calling
get_mac() when the internal state isn't ready raises or returns None)
Upstream matrix-nio has open issues about verification flow fragility (see issues labeled verification in the poljar/matrix-nio repo). The Rust SDK (matrix-rust-sdk) has a bootstrap_cross_signing() helper and higher-level verification orchestration; Python nio doesn't.
Options (re-evaluated)
-
Switch verification to mautrix-python — ✅ recommended
- mature, drives every Element bridge in production
- same dep
hermes-agent's own gateway already uses (no new tree)
- has
OlmMachine-level verification helpers that coordinate the full flow
- trade-off: heavier setup (aiosqlite/aiopg store for OlmMachine state) and a more verbose API, but Paste A's responder can stay nio or we migrate that too
-
Augment matrix-nio with our own SAS state machine — possible
- ~150 lines against raw to-device events +
python-olm primitives, bypassing nio's Sas class
- keeps a single SDK surface for agents
- trade-off: we now own Matrix verification spec compliance; any drift is on us
-
Fork matrix-nio — overkill, not recommended
- fork rate > maintenance rate. Project isn't at a scale that justifies it.
-
Wait for upstream nio — ❌ rejected
- that's what we were doing; no sign of movement. Not a real option.
-
Rust SDK bindings (matrix-sdk-ffi, matrix-sdk-crypto-js) — possible, higher friction
- canonical implementation, best feature coverage
- Python ergonomics are rougher than mautrix-python for a small bot
Proposed follow-up work
Interim guidance for users
Paste C is not live. For trust-by-you, compare the bot's MSK fingerprint out-of-band — it's returned as msk_public in the POST /signup/api/crosssign response. Open the bot's profile in Element; Element shows the same fingerprint under "Security." If they match, the bot is the one you onboarded. This is the same security signal SAS gives you, just done with eyes instead of emojis.
Broader take
Matrix has the right threat model for a federated E2EE messenger, but its Python client ecosystem has significant integration gaps for anyone trying to build bots / agents that do more than "send and receive plaintext." Paste A is enough to demonstrate the whole stack end-to-end working from scratch in one paste-to-agent — that's real and shippable. Paste B proves out the server-side cross-signing path — also real. Paste C is a good aha-moment story for demos, but not demo-quality reliable on matrix-nio as it stands. We should stop banging on that door and walk through the one next to it (mautrix-python).
Labels: roadmap, matrix-sdk, verification
TL;DR
Our three-tier agent onboarding is Paste A → Paste B → Paste C, with a visible Element effect at each tier. Pastes A and B are shipped and verified end-to-end. Paste C (SAS / interactive verification so Element shows the bot as "verified by you") is not shipping because the Python SDK we standardized on —
matrix-nio— does not have reliable high-level SAS support, and hand-rolling the state machine on top of its primitives did not produce a stable flow.Plan: switch the responder to
mautrix-pythonfor the verification layer.mautrix-pythonis the SDK that powers every Element Bridge (Telegram, iMessage, Signal, WhatsApp, etc.), drives real SAS flows in production, and is already whathermes-agent's gateway uses internally. This is not a fork, not a rewrite of our stack — it's using the right tool for this layer.This issue documents what we tried, why it didn't work, and the recommended next steps.
Context
The three-tier onboarding in the signup page (
/signup) is designed so an agent — or a new human — progresses through visible Element states:POST /signup/api/crosssign)Paste A uses
matrix-nio[e2e](libolm-backed crypto) and works on weak open models (verified withgpt-oss-120bvia OpenRouter): end-to-end onboarding, encrypted DM, server-assignedevent_idreturned as un-fakeable proof of success. Paste B is a server-side helper (knock-approver/approver.py): it generates three Ed25519 keypairs for the new user, signs SSK and USK with MSK, signs the user's device with SSK, uploads via/_matrix/client/v3/keys/device_signing/upload+/_matrix/client/v3/keys/signatures/upload. Verified with/keys/queryshowingmaster_keys,self_signing_keys,user_signing_keys, and the device signed by SSK.Paste C would add a SAS verification handler to the responder so that when the user clicks "Verify" on the bot's profile in Element, the Matrix verification state machine runs through
m.key.verification.{request,ready,start,accept,key,mac,done}without any human-on-the-bot-side intervention.What we tried (matrix-nio)
Reference callback shape, modeled on
matrix-niodocs and examples:To test, we wrote a driver (
tests/test_sas_verify.py, not committed) that:/signup/api/signup/api/crosssign(Paste B) so the bot has MSK/SSK/USK/_matrix/client/v3/keys/query, injects it into the verifier'sdevice_store, and callsstart_key_verification(device)to send the initialm.key.verification.startWhat happened
Bot's log consistently showed exactly one event:
Then silence. The verifier never received
m.key.verification.keyfrom the bot, soconfirm_short_auth_stringnever fired, MACs never exchanged, 30-second timeout, done.Things we ruled out:
!pingstill returnedpong)device_signed: false(re-crosssigned after bot uploaded device keys;device_signed: trueconfirmed via/keys/query)ToDeviceEventbase class behaved identically)Root cause (as best we can tell)
matrix-nio'sSasclass (innio/crypto/sas.py) is a partial implementation: it exposes the primitives (share_key,get_mac,confirm_short_auth_string) but does not orchestrate the full state machine for you. The caller is expected to:share_keyvs. wait for nio's sync loop to have implicitly done som.key.verification.acceptevent as a side effect ofaccept_key_verification, but ALSO sometimes emit am.key.verification.keyviashare_key()— redundantly? not always? — depending on whether we were initiator or responderget_mac()when the internal state isn't ready raises or returnsNone)Upstream matrix-nio has open issues about verification flow fragility (see issues labeled
verificationin the poljar/matrix-nio repo). The Rust SDK (matrix-rust-sdk) has abootstrap_cross_signing()helper and higher-level verification orchestration; Python nio doesn't.Options (re-evaluated)
Switch verification to mautrix-python — ✅ recommended
hermes-agent's own gateway already uses (no new tree)OlmMachine-level verification helpers that coordinate the full flowAugment matrix-nio with our own SAS state machine — possible
python-olmprimitives, bypassing nio'sSasclassFork matrix-nio — overkill, not recommended
Wait for upstream nio — ❌ rejected
Rust SDK bindings (
matrix-sdk-ffi,matrix-sdk-crypto-js) — possible, higher frictionProposed follow-up work
responder.pyfrommatrix-niotomautrix-python. ~200 lines of rewrite. Verify the same!pinground-trip works.OlmMachine-driven verification callback that handles the full SAS flow (Element'shandle_verificationpattern from the bridges)./signupand inskills/matrix-invite-join/SKILL.mdwith the mautrix-python version.Interim guidance for users
Paste C is not live. For trust-by-you, compare the bot's MSK fingerprint out-of-band — it's returned as
msk_publicin thePOST /signup/api/crosssignresponse. Open the bot's profile in Element; Element shows the same fingerprint under "Security." If they match, the bot is the one you onboarded. This is the same security signal SAS gives you, just done with eyes instead of emojis.Broader take
Matrix has the right threat model for a federated E2EE messenger, but its Python client ecosystem has significant integration gaps for anyone trying to build bots / agents that do more than "send and receive plaintext." Paste A is enough to demonstrate the whole stack end-to-end working from scratch in one paste-to-agent — that's real and shippable. Paste B proves out the server-side cross-signing path — also real. Paste C is a good aha-moment story for demos, but not demo-quality reliable on
matrix-nioas it stands. We should stop banging on that door and walk through the one next to it (mautrix-python).Labels:
roadmap,matrix-sdk,verification