Steps to reproduce
Hi, thanks for taking a look at this.
I am running a self-hosted MatrixRTC setup with LiveKit for Element Call / Element X, behind nginx. I am seeing a disconnect issue on Android that I have not been able to narrow down yet.
The call starts normally at first:
- Element / MatrixRTC is configured to use a self-hosted LiveKit SFU through the JWT service.
- I join a call from Element X on Android.
- The participant joins the room successfully.
- The publisher PeerConnection connects.
- The microphone track is published.
- RTP starts flowing.
- After around 15 to 20 seconds, the call disconnects.
On Android, Element X only shows this user-facing error:
From the LiveKit side, the interesting part is that the WebSocket signal connection appears to be closed by the client first:
finishing WS connection ... "closedByClient": true
signal stream closed ... "error": null
Shortly after that, the subscriber transport fails and LiveKit closes the signal connection with TRANSPORT_FAILURE:
ice connection state change ... "transport": "SUBSCRIBER", "state": "failed"
peer connection state change ... "transport": "SUBSCRIBER", "state": "failed"
ignoring prefer candidate check by ICE failure because signal connection interrupted
closing signal connection ... "reason": "TRANSPORT_FAILURE"
The participant is then closed with:
participant closing ... "reason": "PEER_CONNECTION_DISCONNECTED"
The RTP stats also show very high upstream audio packet loss before the participant is closed:
rtp stats ... "direction": "upstream", "packetsExpected": 264, "packetsSeenPrimary": 127, "packetsLost": 137, "packetLostPercentage": 51.893936, "rtt": 59
Before I enabled full debug logging in the LiveKit config, I had also seen errors like these in the logs:
dtls timeout
peerconnection disconnected
My setup is:
- LiveKit server:
livekit/livekit-server:v1.10.1
- LiveKit JWT service:
ghcr.io/element-hq/lk-jwt-service:0.4.4
- Synapse:
ghcr.io/element-hq/synapse:v1.150.0
- Element Web:
ghcr.io/element-hq/element-web:v1.12.6
- Deployment: Docker Compose
- Reverse proxy: nginx
- TLS is terminated at nginx for the LiveKit WebSocket/API endpoint
- MatrixRTC SFU URL:
wss://<MATRIX_DOMAIN>/livekit/sfu
- MatrixRTC JWT URL:
https://<MATRIX_DOMAIN>/livekit/jwt
- Client from the logs: Element X / LiveKit JS SDK
2.16.0, Android 13, Wi-Fi
LiveKit is started like this:
livekit:
image: livekit/livekit-server:v1.10.1
command: ["--config", "/etc/livekit/config.yaml", "--node-ip", "<PUBLIC_SERVER_IP>"]
volumes:
- ${LIVEKIT_DATA_PATH}:/etc/livekit:ro
- ${LIVEKIT_TLS_CERT_DIR}:/etc/lk-certs:ro
ports:
- "7881:7881/tcp"
- "50201-50501:50201-50501/udp"
- "3478:3478/udp"
- "5349:5349/tcp"
- "50502-50601:50502-50601/udp"
The main RTC/TURN part of the LiveKit config is:
port: 7880
rtc:
tcp_port: 7881
node_ip: <PUBLIC_SERVER_IP>
port_range_start: 50201
port_range_end: 50501
use_external_ip: false
allow_tcp_fallback: true
strict_acks: false
reconnect_on_publication_error: true
reconnect_on_subscription_error: true
reconnect_on_data_channel_error: true
turn:
enabled: true
domain: <MATRIX_DOMAIN>
udp_port: 3478
tls_port: 5349
relay_range_start: 50502
relay_range_end: 50601
external_tls: false
cert_file: /etc/lk-certs/live/<MATRIX_DOMAIN>/fullchain.pem
key_file: /etc/lk-certs/live/<MATRIX_DOMAIN>/privkey.pem
The JWT service points clients to:
LIVEKIT_URL: "wss://<MATRIX_DOMAIN>/livekit/sfu"
nginx proxies /livekit/sfu to LiveKit port 7880 with WebSocket upgrade enabled:
location = /livekit/sfu {
proxy_pass http://livekit_up/;
proxy_http_version 1.1;
proxy_send_timeout 3600s;
proxy_read_timeout 3600s;
proxy_buffering off;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
Outcome
Expected:
The participant should remain connected after joining the call and publishing audio. Both publisher and subscriber PeerConnections should stay healthy.
Actual:
The call starts successfully, but after around 15 to 20 seconds it disconnects. On Android, Element X shows UNKNOWN_ERROR.
From the LiveKit logs, the sequence looks like this:
starting signal connection ... "Client":{"sdk":1,"version":"2.16.0","protocol":16,"os":"Android","os_version":"13","network":"wifi"}
sent signal response ... "iceServers": [{"urls": ["turn:<PUBLIC_SERVER_IP>:3478?transport=udp", "turns:<MATRIX_DOMAIN>:443?transport=tcp"]}]
ice connection state change ... "transport": "PUBLISHER", "state": "connected"
mediaTrack published ... "kind": "audio", "mime": "audio/opus"
quality drop ... "direction": "up", "reason": "packet", "packetLostPercentage": 39.86928
error reading data channel ... "label": "_lossy", "error": "abort chunk, with following errors: (User Initiated Abort: Close called)"
error reading data channel ... "label": "_reliable", "error": "abort chunk, with following errors: (User Initiated Abort: Close called)"
finishing WS connection ... "closedByClient": true
ice connection state change ... "transport": "SUBSCRIBER", "state": "failed"
peer connection state change ... "transport": "SUBSCRIBER", "state": "failed"
closing signal connection ... "reason": "TRANSPORT_FAILURE"
participant closing ... "reason": "PEER_CONNECTION_DISCONNECTED"
rtp stats ... "packetsExpected": 264, "packetsSeenPrimary": 127, "packetsLost": 137, "packetLostPercentage": 51.893936
I am not sure if this is just a network/NAT packet loss issue, a mistake in my LiveKit/TURN/nginx configuration, or an interoperability issue between Element X / MatrixRTC and this LiveKit setup.
The part I find most confusing is the order of events. The client seems to close the signal WebSocket first, then the subscriber ICE transport fails, and then LiveKit reports TRANSPORT_FAILURE / PEER_CONNECTION_DISCONNECTED.
Since Android only shows UNKNOWN_ERROR, it is difficult to tell from the client side what the real root cause is.
Any guidance on what I should check next would be appreciated, especially around TURN advertisement, nginx path setup, and whether this level of upstream packet loss would be enough to explain the Android UNKNOWN_ERROR.
Operating system
Android 13
Browser information
Not using a browser directly.
The client is Element X on Android. The LiveKit server logs identify the client as:
LiveKit JS SDK 2.16.0
OS: Android 13
Network: Wi-Fi
URL for webapp
Private self-hosted Matrix / Element deployment.
Element Web image: ghcr.io/element-hq/element-web:v1.12.6
LiveKit JWT URL: https://<MATRIX_DOMAIN>/livekit/jwt
LiveKit SFU URL: wss://<MATRIX_DOMAIN>/livekit/sfu
Will you send logs?
Yes.
I can provide redacted LiveKit debug logs from the server side. I have not submitted client debug logs yet, but I can try to reproduce the issue again and submit feedback with debug logs linked to this issue.
Steps to reproduce
Hi, thanks for taking a look at this.
I am running a self-hosted MatrixRTC setup with LiveKit for Element Call / Element X, behind nginx. I am seeing a disconnect issue on Android that I have not been able to narrow down yet.
The call starts normally at first:
On Android, Element X only shows this user-facing error:
From the LiveKit side, the interesting part is that the WebSocket signal connection appears to be closed by the client first:
Shortly after that, the subscriber transport fails and LiveKit closes the signal connection with
TRANSPORT_FAILURE:The participant is then closed with:
The RTP stats also show very high upstream audio packet loss before the participant is closed:
Before I enabled full debug logging in the LiveKit config, I had also seen errors like these in the logs:
My setup is:
livekit/livekit-server:v1.10.1ghcr.io/element-hq/lk-jwt-service:0.4.4ghcr.io/element-hq/synapse:v1.150.0ghcr.io/element-hq/element-web:v1.12.6wss://<MATRIX_DOMAIN>/livekit/sfuhttps://<MATRIX_DOMAIN>/livekit/jwt2.16.0, Android 13, Wi-FiLiveKit is started like this:
The main RTC/TURN part of the LiveKit config is:
The JWT service points clients to:
nginx proxies
/livekit/sfuto LiveKit port7880with WebSocket upgrade enabled:Outcome
Expected:
The participant should remain connected after joining the call and publishing audio. Both publisher and subscriber PeerConnections should stay healthy.
Actual:
The call starts successfully, but after around 15 to 20 seconds it disconnects. On Android, Element X shows
UNKNOWN_ERROR.From the LiveKit logs, the sequence looks like this:
I am not sure if this is just a network/NAT packet loss issue, a mistake in my LiveKit/TURN/nginx configuration, or an interoperability issue between Element X / MatrixRTC and this LiveKit setup.
The part I find most confusing is the order of events. The client seems to close the signal WebSocket first, then the subscriber ICE transport fails, and then LiveKit reports
TRANSPORT_FAILURE/PEER_CONNECTION_DISCONNECTED.Since Android only shows
UNKNOWN_ERROR, it is difficult to tell from the client side what the real root cause is.Any guidance on what I should check next would be appreciated, especially around TURN advertisement, nginx path setup, and whether this level of upstream packet loss would be enough to explain the Android
UNKNOWN_ERROR.Operating system
Android 13
Browser information
Not using a browser directly.
The client is Element X on Android. The LiveKit server logs identify the client as:
URL for webapp
Private self-hosted Matrix / Element deployment.
Will you send logs?
Yes.
I can provide redacted LiveKit debug logs from the server side. I have not submitted client debug logs yet, but I can try to reproduce the issue again and submit feedback with debug logs linked to this issue.