A Go library for SIP calling and RTP media. Register with a SIP trunk or PBX, or accept calls directly as a SIP server — and get decoded PCM audio frames through Go channels.
Also available in Rust.
- Status | Scope and Limitations | Tested Against | Use Cases
- Quick Start | Connection Modes | Working with Audio
- Features | Call States | Call Control | Media Pipeline
- Configuration | RTP Port Range | NAT Traversal | Opus Codec
- Testing | Example App | Logging | Stack | Roadmap
xphone is in active development and used in internal production workloads. APIs may change between minor versions. If you're evaluating, start with the Quick Start below or the demos repo.
xphone is a voice data-plane library — SIP signaling and RTP media. It is not a telephony platform.
You are responsible for:
- Billing, number provisioning, and call routing rules
- Recording storage and playback infrastructure
- High availability, persistence, and failover
- Rate limiting, authentication, and abuse prevention at the application level
Security boundaries:
- SRTP uses SDES key exchange only. DTLS-SRTP is not supported — xphone cannot interop with WebRTC endpoints that require it.
- TLS is supported for SIP transport. See Configuration for transport options.
- There is no built-in authentication layer for your application — xphone authenticates to SIP servers, not your end users.
Codec constraints:
- Opus and G.729 require CGO and external C libraries. The default build is CGO-free (G.711, G.722 only).
- PCM sample rate is fixed at 8 kHz (narrowband) or 16 kHz (G.722 wideband). There is no configurable sample rate.
| Category | Tested with |
|---|---|
| SIP trunks | Telnyx, Twilio SIP, VoIP.ms, Vonage |
| PBXes | Asterisk, FreeSWITCH, 3CX |
| Integration tests | fakepbx (in-process, no Docker) + xpbx (Dockerized Asterisk) in CI |
| Unit tests | MockPhone & MockCall — full Phone/Call interface mocks |
This is not a comprehensive compatibility matrix. If you hit issues with a provider or PBX not listed here, please open an issue.
- AI voice agents — pipe call audio directly into your STT/LLM/TTS pipeline without a telephony platform
- Softphones and click-to-call — embed SIP calling into any Go application against a trunk or PBX
- Call recording and monitoring — tap the PCM audio stream for transcription, analysis, or storage
- Outbound dialers — programmatic dialing with DTMF detection for IVR automation
- Unit-testable call flows — MockPhone and MockCall let you test every call branch without a SIP server
go get github.com/x-phone/xphone-goRequires Go 1.23+.
package main
import (
"context"
"fmt"
"log"
xphone "github.com/x-phone/xphone-go"
)
func main() {
phone := xphone.New(
xphone.WithCredentials("1001", "secret", "sip.telnyx.com"),
xphone.WithRTPPorts(10000, 20000),
)
phone.OnRegistered(func() {
fmt.Println("Registered -- ready to receive calls")
})
phone.OnIncoming(func(call xphone.Call) {
fmt.Printf("Incoming call from %s\n", call.From())
call.Accept()
go func() {
for frame := range call.PCMReader() {
// frame is []int16, mono, 8000 Hz, 160 samples (20ms)
transcribe(frame)
}
}()
})
if err := phone.Connect(context.Background()); err != nil {
log.Fatal(err)
}
select {}
}PCM format: []int16, mono, 8000 Hz, 160 samples per frame (20ms) — the standard input format for most speech-to-text APIs.
call, err := phone.Dial(ctx, "+15551234567",
xphone.WithEarlyMedia(),
xphone.WithDialTimeout(30 * time.Second),
)
if err != nil {
log.Fatal(err)
}
go func() {
for frame := range call.PCMReader() {
processAudio(frame)
}
}()Dial accepts a full SIP URI ("sip:1002@pbx.example.com") or just the user part ("1002"), in which case your configured SIP server is used.
xphone supports two ways to connect to the SIP world. Both produce the same Call interface — accept, end, DTMF, PCMReader/Writer are identical.
Registers with a SIP server like a normal endpoint. Use this with SIP trunks (Telnyx, Vonage), PBXes (Asterisk, FreeSWITCH), or any SIP registrar. No PBX is required — you can register directly with a SIP trunk provider:
phone := xphone.New(
xphone.WithCredentials("1001", "secret", "sip.telnyx.com"),
)
phone.OnIncoming(func(call xphone.Call) { call.Accept() })
phone.Connect(ctx)Accepts and places calls directly with trusted SIP peers — no registration required. Use this when trunk providers send INVITEs to your public IP, or when a PBX routes calls to your application:
server := xphone.NewServer(xphone.ServerConfig{
Listen: "0.0.0.0:5080",
RTPPortMin: 10000,
RTPPortMax: 20000,
RTPAddress: "203.0.113.1", // your public IP
Peers: []xphone.PeerConfig{
{Name: "twilio", Hosts: []string{"54.172.60.0/30", "54.244.51.0/30"}},
{Name: "office-pbx", Host: "192.168.1.10"},
},
})
server.OnIncoming(func(call xphone.Call) { call.Accept() })
server.Listen(ctx)Peers are authenticated by IP/CIDR or SIP digest auth. Per-peer codec and RTP address overrides are supported.
For zero-downtime deploys, use ServerConfig.Listener with a pre-bound socket (e.g., with SO_REUSEPORT):
conn, _ := net.ListenPacket("udp4", "0.0.0.0:5080")
// socket2 can set SO_REUSEPORT before binding
server := xphone.NewServer(xphone.ServerConfig{
Listener: conn,
// ...
})
server.Listen(ctx)Which mode? Use Phone when you register to a SIP server (most setups). Use Server when SIP peers send INVITEs directly to your application (Twilio SIP Trunk, direct PBX routing, peer-to-peer).
xphone exposes audio as a stream of PCM frames through Go channels.
| Property | Value |
|---|---|
| Encoding | 16-bit signed PCM |
| Channels | Mono |
| Sample rate | 8000 Hz |
| Samples per frame | 160 |
| Frame duration | 20ms |
call.PCMReader() returns a <-chan []int16. Each receive gives you one 20ms frame of decoded audio from the remote party:
go func() {
for frame := range call.PCMReader() {
sendToSTT(frame)
}
// channel closes when the call ends
}()Important: Read frames promptly. The inbound buffer holds 256 frames (~5 seconds). If you fall behind, the oldest frames are silently dropped.
call.PCMWriter() returns a chan<- []int16. Send one 20ms frame at a time:
go func() {
ticker := time.NewTicker(20 * time.Millisecond)
defer ticker.Stop()
for range ticker.C {
frame := getNextTTSFrame() // []int16, 160 samples
select {
case call.PCMWriter() <- frame:
default:
// outbound buffer full -- frame dropped, keep going
}
}
}()Important:
PCMWriter()sends each buffer as an RTP packet immediately — the caller must provide frames at real-time rate (one 160-sample frame every 20ms). For TTS or file playback, usePacedPCMWriter()instead.
call.PacedPCMWriter() accepts arbitrary-length PCM buffers and handles framing + pacing internally:
ttsAudio := elevenLabs.Synthesize("Hello, how can I help you?")
call.PacedPCMWriter() <- ttsAudioFor lower-level control — pre-encoded audio, custom codecs, or RTP header inspection:
go func() {
for pkt := range call.RTPReader() {
processRTP(pkt) // *rtp.Packet (pion/rtp)
}
}()
call.RTPWriter() <- myRTPPacket
RTPWriterandPCMWriterare mutually exclusive — if you write toRTPWriter,PCMWriteris ignored for that call.
func pcmToFloat32(frame []int16) []float32 {
out := make([]float32, len(frame))
for i, s := range frame {
out[i] = float32(s) / 32768.0
}
return out
}- SIP registration with auto-reconnect and keepalive
- Inbound and outbound calls
- Hold / resume (re-INVITE)
- Blind transfer (REFER) and attended transfer (REFER with Replaces, RFC 3891)
- Call waiting (
Phone.Calls()API) - Session timers (RFC 4028)
- Mute / unmute
- 302 redirect following
- Early media (183 Session Progress)
ReplaceAudioWriter— atomic audio source swap (e.g., music on hold)- Outbound proxy routing (
WithOutboundProxy) — single next-hop for all signaling (REGISTER, SUBSCRIBE, MESSAGE, INVITE) - Separate outbound credentials (
WithOutboundCredentials) - Separate digest auth identity (
WithAuthUsername) — for PBXes like 3CX where the Authentication ID differs from the extension - RFC 3581 rport (
WithNAT) — for phones behind NAT; PBX replies come back to the NAT-mapped source port - Custom headers on outbound INVITEs (
WithHeader,DialOptions.CustomHeaders) Server.DialURI— dial arbitrary SIP URIs without pre-configured peers- Transfer failure surfaced via
EndedByTransferFailedend reason
- RFC 4733 (RTP telephone-events)
- SIP INFO (RFC 2976)
- G.711 u-law (PCMU), G.711 A-law (PCMA) — built-in
- G.722 wideband — built-in
- Opus — optional, requires CGO + libopus (
-tags opus) - G.729 — optional, requires CGO + bcg729 (
-tags g729) - Jitter buffer
- H.264 (RFC 6184) and VP8 (RFC 7741)
- Depacketizer/packetizer pipeline
- Mid-call video upgrade/downgrade (re-INVITE)
- VideoReader / VideoWriter / VideoRTPReader / VideoRTPWriter
- RTCP PLI/FIR for keyframe requests
- SRTP (AES_CM_128_HMAC_SHA1_80) with SDES key exchange
- SRTP replay protection (RFC 3711)
- SRTCP encryption (RFC 3711 §3.4)
- Key material zeroization
- Separate SRTP contexts for audio and video
- TCP and TLS SIP transport
- STUN NAT traversal (RFC 5389)
- TURN relay for symmetric NAT (RFC 5766)
- ICE-Lite (RFC 8445 §2.2)
- RTCP Sender/Receiver Reports (RFC 3550)
- SIP MESSAGE (RFC 3428)
- SIP SUBSCRIBE/NOTIFY (RFC 6665)
- MWI / voicemail notification (RFC 3842)
- BLF / Busy Lamp Field monitoring
- SIP Presence (RFC 3856)
- MockPhone and MockCall — full interface mocks for unit testing
Idle -> Ringing (inbound) or Dialing (outbound)
-> RemoteRinging -> Active <-> OnHold -> Ended
call.OnState(func(state xphone.CallState) {
fmt.Printf("State: %v\n", state)
})
call.OnEnded(func(reason xphone.EndReason) {
fmt.Printf("Ended: %v\n", reason)
})call.Hold()
call.Resume()
call.BlindTransfer("sip:1003@pbx.example.com")
callA.AttendedTransfer(callB)
call.Mute()
call.Unmute()
call.SendDTMF("5")
call.OnDTMF(func(digit string) {
fmt.Printf("Received: %s\n", digit)
})
// Mid-call video upgrade
call.AddVideo(xphone.VideoCodecH264, xphone.VideoCodecVP8)
call.OnVideoRequest(func(req *xphone.VideoUpgradeRequest) {
req.Accept()
})
call.OnVideo(func() {
// read frames from call.VideoReader()
})
phone.SendMessage(ctx, "sip:1002@pbx", "Hello!")Inbound:
SIP Trunk -> RTP/UDP -> RTPRawReader (pre-jitter)
-> Jitter Buffer -> RTPReader (post-jitter)
-> Codec Decode -> PCMReader ([]int16)
Outbound (mutually exclusive):
PCMWriter -> Codec Encode -> RTP/UDP -> SIP Trunk (caller paces at 20ms)
PacedPCMWriter -> Auto-frame + 20ms ticker -> Codec Encode -> RTP/UDP -> SIP Trunk
RTPWriter -> RTP/UDP -> SIP Trunk (raw mode — PCMWriter ignored)
Inbound:
SIP Trunk -> RTP/UDP -> Depacketizer (H.264/VP8) -> VideoReader (NAL units / frames)
-> VideoRTPReader (raw video RTP packets)
Outbound (mutually exclusive):
VideoWriter -> Packetizer (H.264/VP8) -> RTP/UDP -> SIP Trunk
VideoRTPWriter -> RTP/UDP -> SIP Trunk (raw mode)
Video uses a separate RTP port and independent SRTP contexts. RTCP PLI/FIR requests trigger keyframe generation on the sender side.
All channels are buffered (256 entries). Inbound taps drop oldest on overflow; outbound writers drop newest. Audio frames are 160 samples at 8000 Hz = 20ms. Video frames carry codec-specific NAL units (H.264) or encoded frames (VP8).
phone := xphone.New(
xphone.WithCredentials("1001", "secret", "pbx.example.com"),
xphone.WithTransport("udp", nil), // "udp" | "tcp" | "tls"
xphone.WithRTPPorts(10000, 20000), // RTP port range
xphone.WithCodecs(xphone.CodecOpus, xphone.CodecPCMU), // codec preference
xphone.WithJitterBuffer(50 * time.Millisecond),
xphone.WithMediaTimeout(30 * time.Second),
xphone.WithNATKeepalive(25 * time.Second),
xphone.WithStunServer("stun.l.google.com:19302"),
xphone.WithSRTP(),
xphone.WithDtmfMode(xphone.DtmfModeRFC4733), // or DtmfModeSIPInfo
xphone.WithICE(true), // ICE-Lite
xphone.WithTurnServer("turn.example.com:3478"),
xphone.WithTurnCredentials("user", "pass"),
xphone.WithOutboundProxy("sip:proxy.example.com:5060"), // single next-hop for all SIP signaling
xphone.WithOutboundCredentials("trunk-user", "trunk-pass"), // separate INVITE auth
xphone.WithAuthUsername("auth-id"), // 3CX-style: digest username != SIP AOR
xphone.WithNAT(), // enable rport (RFC 3581) when behind NAT
xphone.WithLogger(slog.Default()),
)See pkg.go.dev for all options.
Each active call requires an even-numbered UDP port for RTP audio. Configure an explicit range for production deployments behind firewalls:
phone := xphone.New(
xphone.WithCredentials("1001", "secret", "sip.telnyx.com"),
xphone.WithRTPPorts(10000, 20000),
)Only even ports are used (per RTP spec). Maximum concurrent audio-only calls = (max - min) / 2.
| Range | Even ports | Max concurrent calls |
|---|---|---|
| 10000–10100 | 50 | ~50 |
| 10000–12000 | 1000 | ~1000 |
| 10000–20000 | 5000 | ~5000 |
When ports run out: inbound calls receive a 500 Internal Server Error and outbound dials fail with an error. Widen the range before investigating SIP server configuration.
If WithRTPPorts is not called, the OS assigns ephemeral ports. This works for development but is impractical in production where firewall rules need a known range.
Discovers your public IP via a STUN Binding Request:
phone := xphone.New(
xphone.WithCredentials("1001", "secret", "sip.telnyx.com"),
xphone.WithStunServer("stun.l.google.com:19302"),
)For environments where STUN alone fails (cloud VMs, corporate firewalls):
phone := xphone.New(
xphone.WithCredentials("1001", "secret", "sip.telnyx.com"),
xphone.WithTurnServer("turn.example.com:3478"),
xphone.WithTurnCredentials("user", "pass"),
)SDP-level candidate negotiation (RFC 8445 §2.2):
phone := xphone.New(
xphone.WithCredentials("1001", "secret", "sip.telnyx.com"),
xphone.WithICE(true),
xphone.WithStunServer("stun.l.google.com:19302"),
)Only enable STUN/TURN/ICE when the SIP server is on the public internet. Do not enable it when connecting via VPN or private network.
Opus is optional and requires CGO + libopus. The default build is CGO-free.
# Debian / Ubuntu
sudo apt-get install libopus-dev libopusfile-dev
# macOS
brew install opus opusfilego build -tags opus ./...
go test -tags opus ./...phone := xphone.New(
xphone.WithCredentials("1001", "secret", "sip.telnyx.com"),
xphone.WithCodecs(xphone.CodecOpus, xphone.CodecPCMU),
)Opus runs at 8kHz natively — no resampling needed. PCM frames remain []int16, mono, 160 samples (20ms). RTP timestamps use 48kHz clock per RFC 7587.
Without the opus build tag, CodecOpus is accepted in configuration but will not be negotiated.
MockPhone and MockCall implement the Phone and Call interfaces:
phone := xphone.NewMockPhone()
phone.Connect(context.Background())
phone.OnIncoming(func(c xphone.Call) {
c.Accept()
})
phone.SimulateIncoming("sip:1001@pbx")
assert.Equal(t, xphone.StateActive, phone.LastCall().State())call := xphone.NewMockCall()
call.Accept()
call.SendDTMF("5")
assert.Equal(t, []string{"5"}, call.SentDTMF())
call.SimulateDTMF("9")
call.InjectRTP(pkt)go test -v -run TestFakePBX ./...
go test -v -run TestServerFakePBX ./...cd testutil/docker && docker compose up -d
go test -tags=integration -v -count=1 ./...
cd testutil/docker && docker compose downexamples/sipcli is a terminal SIP client with registration, calls, hold, resume, DTMF, mute, transfer, echo mode, and speaker output:
cd examples/sipcli
go run . -profile myserver # from ~/.sipcli.yaml
go run . -server pbx.example.com -user 1001 -pass secretxphone uses Go's log/slog:
phone := xphone.New(
xphone.WithLogger(slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
Level: slog.LevelDebug,
}))),
)
// Silence all library logs
phone := xphone.New(
xphone.WithLogger(slog.New(slog.NewTextHandler(io.Discard, nil))),
)If no logger is provided, slog.Default() is used.
| Layer | Implementation |
|---|---|
| SIP Signaling | sipgo |
| RTP / SRTP | pion/rtp + built-in SRTP (AES_CM_128_HMAC_SHA1_80) |
| G.711 / G.722 | Built-in (PCMU, PCMA) + gotranspile/g722 |
| G.729 | AoiEnoki/bcg729 (optional, -tags g729) |
| Opus | hraban/opus (optional, -tags opus) |
| H.264 / VP8 | Built-in packetizer/depacketizer (RFC 6184, RFC 7741) |
| RTCP | Built-in (RFC 3550 SR/RR + PLI/FIR) |
| Jitter Buffer | Built-in |
| STUN | Built-in (RFC 5389) |
| TURN | Built-in (RFC 5766) |
| ICE-Lite | Built-in (RFC 8445 §2.2) |
| TUI (sipcli) | bubbletea + lipgloss |
- DTLS-SRTP key exchange (WebRTC interop)
- Full ICE (RFC 5245)
MIT