Skip to content

Self-hosted MatrixRTC LiveKit call disconnects after ~15s on Android with UNKNOWN_ERROR #3925

@MortezaJavadian

Description

@MortezaJavadian

Steps to reproduce

Hi, thanks for taking a look at this.

I am running a self-hosted MatrixRTC setup with LiveKit for Element Call / Element X, behind nginx. I am seeing a disconnect issue on Android that I have not been able to narrow down yet.

The call starts normally at first:

  1. Element / MatrixRTC is configured to use a self-hosted LiveKit SFU through the JWT service.
  2. I join a call from Element X on Android.
  3. The participant joins the room successfully.
  4. The publisher PeerConnection connects.
  5. The microphone track is published.
  6. RTP starts flowing.
  7. After around 15 to 20 seconds, the call disconnects.

On Android, Element X only shows this user-facing error:

UNKNOWN_ERROR

From the LiveKit side, the interesting part is that the WebSocket signal connection appears to be closed by the client first:

finishing WS connection ... "closedByClient": true
signal stream closed ... "error": null

Shortly after that, the subscriber transport fails and LiveKit closes the signal connection with TRANSPORT_FAILURE:

ice connection state change ... "transport": "SUBSCRIBER", "state": "failed"
peer connection state change ... "transport": "SUBSCRIBER", "state": "failed"
ignoring prefer candidate check by ICE failure because signal connection interrupted
closing signal connection ... "reason": "TRANSPORT_FAILURE"

The participant is then closed with:

participant closing ... "reason": "PEER_CONNECTION_DISCONNECTED"

The RTP stats also show very high upstream audio packet loss before the participant is closed:

rtp stats ... "direction": "upstream", "packetsExpected": 264, "packetsSeenPrimary": 127, "packetsLost": 137, "packetLostPercentage": 51.893936, "rtt": 59

Before I enabled full debug logging in the LiveKit config, I had also seen errors like these in the logs:

dtls timeout
peerconnection disconnected

My setup is:

  • LiveKit server: livekit/livekit-server:v1.10.1
  • LiveKit JWT service: ghcr.io/element-hq/lk-jwt-service:0.4.4
  • Synapse: ghcr.io/element-hq/synapse:v1.150.0
  • Element Web: ghcr.io/element-hq/element-web:v1.12.6
  • Deployment: Docker Compose
  • Reverse proxy: nginx
  • TLS is terminated at nginx for the LiveKit WebSocket/API endpoint
  • MatrixRTC SFU URL: wss://<MATRIX_DOMAIN>/livekit/sfu
  • MatrixRTC JWT URL: https://<MATRIX_DOMAIN>/livekit/jwt
  • Client from the logs: Element X / LiveKit JS SDK 2.16.0, Android 13, Wi-Fi

LiveKit is started like this:

livekit:
  image: livekit/livekit-server:v1.10.1
  command: ["--config", "/etc/livekit/config.yaml", "--node-ip", "<PUBLIC_SERVER_IP>"]
  volumes:
    - ${LIVEKIT_DATA_PATH}:/etc/livekit:ro
    - ${LIVEKIT_TLS_CERT_DIR}:/etc/lk-certs:ro
  ports:
    - "7881:7881/tcp"
    - "50201-50501:50201-50501/udp"
    - "3478:3478/udp"
    - "5349:5349/tcp"
    - "50502-50601:50502-50601/udp"

The main RTC/TURN part of the LiveKit config is:

port: 7880

rtc:
  tcp_port: 7881
  node_ip: <PUBLIC_SERVER_IP>
  port_range_start: 50201
  port_range_end: 50501
  use_external_ip: false
  allow_tcp_fallback: true
  strict_acks: false
  reconnect_on_publication_error: true
  reconnect_on_subscription_error: true
  reconnect_on_data_channel_error: true

turn:
  enabled: true
  domain: <MATRIX_DOMAIN>
  udp_port: 3478
  tls_port: 5349
  relay_range_start: 50502
  relay_range_end: 50601
  external_tls: false
  cert_file: /etc/lk-certs/live/<MATRIX_DOMAIN>/fullchain.pem
  key_file: /etc/lk-certs/live/<MATRIX_DOMAIN>/privkey.pem

The JWT service points clients to:

LIVEKIT_URL: "wss://<MATRIX_DOMAIN>/livekit/sfu"

nginx proxies /livekit/sfu to LiveKit port 7880 with WebSocket upgrade enabled:

location = /livekit/sfu {
  proxy_pass http://livekit_up/;
  proxy_http_version 1.1;
  proxy_send_timeout 3600s;
  proxy_read_timeout 3600s;
  proxy_buffering off;
  proxy_set_header Upgrade $http_upgrade;
  proxy_set_header Connection "upgrade";
  proxy_set_header Host $host;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  proxy_set_header X-Forwarded-Proto $scheme;
}

Outcome

Expected:

The participant should remain connected after joining the call and publishing audio. Both publisher and subscriber PeerConnections should stay healthy.

Actual:

The call starts successfully, but after around 15 to 20 seconds it disconnects. On Android, Element X shows UNKNOWN_ERROR.

From the LiveKit logs, the sequence looks like this:

starting signal connection ... "Client":{"sdk":1,"version":"2.16.0","protocol":16,"os":"Android","os_version":"13","network":"wifi"}

sent signal response ... "iceServers": [{"urls": ["turn:<PUBLIC_SERVER_IP>:3478?transport=udp", "turns:<MATRIX_DOMAIN>:443?transport=tcp"]}]

ice connection state change ... "transport": "PUBLISHER", "state": "connected"

mediaTrack published ... "kind": "audio", "mime": "audio/opus"

quality drop ... "direction": "up", "reason": "packet", "packetLostPercentage": 39.86928

error reading data channel ... "label": "_lossy", "error": "abort chunk, with following errors: (User Initiated Abort: Close called)"
error reading data channel ... "label": "_reliable", "error": "abort chunk, with following errors: (User Initiated Abort: Close called)"

finishing WS connection ... "closedByClient": true

ice connection state change ... "transport": "SUBSCRIBER", "state": "failed"
peer connection state change ... "transport": "SUBSCRIBER", "state": "failed"

closing signal connection ... "reason": "TRANSPORT_FAILURE"

participant closing ... "reason": "PEER_CONNECTION_DISCONNECTED"

rtp stats ... "packetsExpected": 264, "packetsSeenPrimary": 127, "packetsLost": 137, "packetLostPercentage": 51.893936

I am not sure if this is just a network/NAT packet loss issue, a mistake in my LiveKit/TURN/nginx configuration, or an interoperability issue between Element X / MatrixRTC and this LiveKit setup.

The part I find most confusing is the order of events. The client seems to close the signal WebSocket first, then the subscriber ICE transport fails, and then LiveKit reports TRANSPORT_FAILURE / PEER_CONNECTION_DISCONNECTED.

Since Android only shows UNKNOWN_ERROR, it is difficult to tell from the client side what the real root cause is.

Any guidance on what I should check next would be appreciated, especially around TURN advertisement, nginx path setup, and whether this level of upstream packet loss would be enough to explain the Android UNKNOWN_ERROR.

Operating system

Android 13

Browser information

Not using a browser directly.

The client is Element X on Android. The LiveKit server logs identify the client as:

LiveKit JS SDK 2.16.0
OS: Android 13
Network: Wi-Fi

URL for webapp

Private self-hosted Matrix / Element deployment.

Element Web image: ghcr.io/element-hq/element-web:v1.12.6
LiveKit JWT URL: https://<MATRIX_DOMAIN>/livekit/jwt
LiveKit SFU URL: wss://<MATRIX_DOMAIN>/livekit/sfu

Will you send logs?

Yes.

I can provide redacted LiveKit debug logs from the server side. I have not submitted client debug logs yet, but I can try to reproduce the issue again and submit feedback with debug logs linked to this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    T-DefectSomething isn't working: bugs, crashes, hangs, vulnerabilities, or other reported problems

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions