Skip to content

feat(irc): IRC server — Egghead as a real IRCd#6

Merged
mwunsch merged 22 commits intomainfrom
irc-server
May 5, 2026
Merged

feat(irc): IRC server — Egghead as a real IRCd#6
mwunsch merged 22 commits intomainfrom
irc-server

Conversation

@mwunsch
Copy link
Copy Markdown
Owner

@mwunsch mwunsch commented May 1, 2026

Summary

An IRC server that turns Egghead into a real IRCd. Connect any IRC client (ERC, irssi, weechat, Goguma, …) to your running egghead serve and you get the full TUI experience over the wire: chat with agents in channels, address them by @nick, watch tool calls and /pass yields render as CTCP ACTION, run the slash-command palette as native verbs (/save, /handoff scout, /mute platypus, /context, …), DM agents directly.

The IRC server runs as a peer of the web endpoint inside egghead serve — same supervision tree, always-on by default with sensible defaults (127.0.0.1:6667), config-driven, opt-out via EGGHEAD_IRC=false / --no-irc.

What works

Surface Wire
Wire protocol RFC 2812 + IRCv3 message tags. NICK/USER/PASS/CAP/PING/QUIT/JOIN/PART/PRIVMSG/NAMES/MODE/LIST/KICK/INVITE/WHOIS/MOTD/VERSION/TIME/CHATHISTORY
Slash palette HANDOFF / SAVE / CONTINUE / HALT / MUTE / UNMUTE / CONTEXT as native IRC verbs (ERC's /handoff scout Just Works)
Agent action layer /pass → CTCP ACTION with PassActions flavor, tool calls → * scout uses read_file path=..., agent join/leave → synthetic JOIN/PART, system notices → IRC NOTICE, halt/continue → NOTICE, paragraph-buffered streaming PRIVMSG, tool denials → CTCP ACTION
#default alias Per-connection alias resolved transparently — JOIN echoes back #default, all wire output for that room uses #default, inbound traffic to #default routes to the canonical room. Strict clients (ERC) only open a buffer when JOIN echo matches the request.
DM PRIVMSG scout :hiEgghead.prompt/3 ephemerally; response back as PRIVMSG from the agent
WHOIS Model + context % in 311 realname, tags + capabilities in 312, channel memberships in 319, 335 RPL_WHOISBOT for distinct rendering
/context Claude Code-style per-agent context-window snapshot with bar charts
Synthesized TOPIC "N agents" — re-emitted on roster changes
IRCv3 caps server-time, batch, chathistory advertised; ACK/NAK on REQ
CHATHISTORY LATEST / BEFORE / AFTER / AROUND / BETWEEN with BATCH-wrapped responses, CHATHISTORY=100 advertised in ISUPPORT
Scrollback replay Last 50 transcript messages on JOIN with @time tags (gated on server-time cap)
Keepalive Server-initiated PING/PONG every 90s with token-correlated round-trip latency in debug logs

Architecture notes

  • Hand-rolled parser/encoder (Egghead.IRC.Protocol) — IRC's wire grammar is tiny and the only Hex package candidate (derpydev/irc) was MIT but dormant since 2017 and didn't cover IRCv3 tags. ~270 lines, fully tested.
  • Per-room forwarder Tasks (Egghead.IRC.Connection) — Phoenix.PubSub doesn't tell handle_info which topic delivered a message, so each joined channel gets a Task that re-tags events with the originating room_id before forwarding to the connection. Linked, dies with the socket.
  • Thousand Island for the TCP listener — modern, OTP-native, MIT, used by Bandit. The only third-party dep added.
  • read_timeout: :infinity on the listener — TI's default 60s read_timeout was killing connections silently before the keepalive ticks; our PING/PONG cycle handles dead-client detection on a tighter, observable cycle.

Configuration

# config.yml — both blocks fully optional
irc:
  port: 6667                                    # default
  bind: 127.0.0.1                               # default
  hostname: irc.local                           # optional; defaults to gethostname()
  password: "{env:EGGHEAD_IRC_PASSWORD}"        # optional shared password
egghead serve                                   # web + IRC, both on defaults
egghead serve --port 8080 --irc-port 6697       # both, custom ports
egghead serve --no-irc                          # web only
egghead serve --no-web                          # IRC only
EGGHEAD_IRC=false egghead serve                 # web only via env
EGGHEAD_IRC_BIND=0.0.0.0 egghead serve          # IRC on all interfaces (LAN)

What's deliberately shelved

Captured at design/irc-shelved in the user's record store:

  • Multi-user / multi-machine identity — today every IRC speaker submits as the system Egghead.User.current(). Echo suppression uses a sender.name == state.nick heuristic (works for one human, breaks for two). Multi-user identity, per-conn-room reverse index in IRC.Registry, NAMES with humans, WHOIS channels for humans, INVITE-to-human routing — all queued for when actual multi-user use becomes real.
  • TLS on 6697 — IRC is loopback / LAN / tailnet only in our model; TLS adds a bring-your-own-cert config story without buying anything for the dominant use case.
  • egghead irc status CLI — originally proposed but scrapped for the same reason egghead irc was scrapped: IRC isn't transport-different from web; it's just another network surface.

Bonus fix

Includes one node discovery fix that surfaced during testing: EGGHEAD_SERVER is now an absolute directive — if set but unreachable, discover_server/0 returns :none instead of silently falling through to the local epmd lookup (which would attach to the wrong server entirely). Same root cause as a flaky test in node_test.exs.

Diff

24 files changed, +6079 / -38. New code lives in lib/egghead/irc/ and test/egghead/irc/; 118 IRC tests covering parser, nick map, registration, channel ops, action events, slash verbs, ops commands, DM, server-time + scrollback, CHATHISTORY, auth.

Test plan

  • Restart egghead serve
  • Connect ERC: M-x erc-tls (or plain) to localhost:6667, nick = your $USER
  • /list — see all rooms, default marked with topic hint
  • /join #default — buffer opens, NAMES shows agents with +v voice prefix
  • Type into the channel — agents respond
  • /whois cassowary (or your agent) — model, context %, tags, caps, channels
  • /context — per-agent context-window snapshot
  • /msg cassowary tell me something — DM round-trip
  • /save, /handoff cassowary, /mute platypus — verb dispatch
  • /quote CHATHISTORY LATEST #default * 50 — backlog fetch (any client)
  • Stay idle for 5+ minutes — connection should NOT drop (keepalive working)
  • In TUI: type into the same room. Agents speaking there show up in the IRC channel buffer too.

🤖 Generated with Claude Code

mwunsch and others added 22 commits April 30, 2026 16:23
…ults

Bring up `Egghead.IRC.Server` as a peer of the web endpoint: a Thousand
Island TCP listener whose per-connection handlers translate IRC commands
to/from `Egghead.Chat.Room` and the room PubSub stream. Mirrors the web
side end-to-end — defaults baked in, no `irc:` block required, opt out
via `--no-irc` / `EGGHEAD_IRC=false`, override port via `--irc-port` /
`EGGHEAD_IRC_PORT`.

Wire side covers RFC 2812 + IRCv3 message tags: NICK / USER / PASS /
CAP / PING / QUIT / JOIN / PART / PRIVMSG / NAMES / MODE / LIST plus
the welcome burst (001..005 ISUPPORT) and the usual error numerics.
Hand-rolled parser/encoder with explicit `params` vs `trailing` so the
wire form stays canonical. Per-conn nicks claim slots in a unique
`Registry`; agent ids are projected onto IRC nicks via `NickMap`.

Inbound PRIVMSG to `#room` is routed through `Room.send_message/2`,
auto-creating the room if needed; `:agent_message` PubSub events flow
back as PRIVMSG from the agent's nick. Self-echoes are suppressed
(M1 caveat documented — single-user identity for now).

55 IRC tests (parser, nick map, end-to-end against a live socket,
auth happy/sad path, MODE, LIST, echo suppression).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`discover_server/0` used a `with` chain: env probe → config → local
epmd. When EGGHEAD_SERVER was set but unreachable, the env probe
returned `:none` and the chain fell through to the local epmd lookup.
On a dev box with `egghead serve` already running, that silently
attached to the *local* `egghead_server@localhost` instead of the
named remote — wrong instance, hidden error.

Also surfaced as a flaky test (`node_test.exs:154`) that passed in
clean environments and failed whenever the operator had an egghead
server running. Same root cause.

Now: if EGGHEAD_SERVER is set, only that target is consulted. Unreachable
returns `:none`. The implicit "find a local server" path runs only
when the operator hasn't named one explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surface every action-shaped room event over IRC with the right wire
form: CTCP ACTION for /pass and tool calls (the `/me` line style),
synthetic JOIN/PART for agent roster changes (so client nicklists
update live), NOTICE for system messages and halt/continue, and
paragraph-buffered PRIVMSG for mid-stream agent text.

Also adds a per-room forwarder Task per joined channel. Phoenix.PubSub
doesn't tell handle_info which topic delivered a message, so when a
connection is in multiple rooms simultaneously some events (passed,
joined, left, system_notice, continued — which don't carry room_id in
the payload) would be ambiguous. Each Task subscribes to one room
and re-tags messages `{:room_event, room_id, original}` before
forwarding to the connection. Linked, so socket close kills them.

Streaming buffer per (room, agent): accumulate deltas, flush on `\n\n`
boundaries as PRIVMSG, keep trailing partial; on the final
:agent_message emit only the unflushed tail so streamed paragraphs
don't double up.

Tool input rendered as `key=value` pairs with values truncated to 40
chars, matching the TUI format. /pass picks fresh flavor from
PassActions per render — TUI and IRC may pick different lines for the
same event, intentional atmospheric divergence.

15 new integration tests cover every event type and multi-room routing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small wins for IRC discoverability:

LIST now sets the topic field on the default room's 322 entry to
"Default room — also reachable as #default" so it's visually
distinguishable from the other rooms (which carry empty topics).
Most clients render the topic next to the channel name, so the
default lights up at a glance.

JOIN #default resolves to whatever the live default room id is
(via Egghead.default_room/0) before subscribing. The JOIN echo uses
the canonical room id, not #default, so the IRC client's membership
state matches the channel name PRIVMSGs and NAMES will arrive on —
otherwise events for #chat-2026-04-30-N would land on a channel the
client doesn't think it joined.

If no default room exists (only possible mid-startup or in tests),
#default falls through to a normal "default" room name and the
auto-create path takes over.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes for the live IRC client experience.

Per-connection channel aliases. When a client joins via #default we
echo JOIN with #default (not the canonical room id), advertise NAMES
under #default, route inbound PRIVMSG #default to the canonical
room, and deliver outbound events back tagged as #default. ERC and
other strict clients only open a channel buffer when the JOIN echo
matches the channel they asked for — echoing the canonical name was
silently failing to open the buffer at all.

The alias is per-connection (`%{room_id => "#alias"}` in connection
state); two clients in the same room can have different views.
Outbound emitters (PRIVMSG, NOTICE, CTCP ACTION, agent JOIN/PART)
all flow through `display_channel/2`; inbound (PRIVMSG, PART, NAMES,
MODE) flow through `target_to_room_id/2` which checks aliases first.

LIST entries now report the actual agent count instead of hardcoded
zero. ERC and weechat hide channels at 0 users in list-mode by
default — populating an honest count makes active rooms visible.
Connected humans aren't counted yet (M4 will index IRC connections
by room via the Registry).

20 new IRC tests in total covering the alias join/part/privmsg/event
round-trip and member count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ERC and some other clients send `LIST :` — LIST with a `:`-introduced
empty trailing param — when the user types /list with no filter. Our
parser put the empty string in `trailing`, so `Message.args/1`
returned `[""]`, not `[]`. The handler then treated `[""]` as a
filter set and matched zero rooms. Bare `LIST` (no `:`) worked
because args returned `[]` and hit the :all branch. Indistinguishable
on the wire from "no filter," so collapse them into one path: flatten
args, drop empties, treat the empty result as match-everything.

Regression test sends `LIST :` verbatim and asserts at least one
322 RPL_LIST entry comes back.

Also keeps a low-volume debug log of `handle_list`'s view (filters,
rooms, matching, default) — useful future diagnostic at narrow scope,
fires once per LIST call. Flip the logger to debug level to see it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…, /context

ERC's `/save`, `/handoff scout`, `/mute platypus`, `/halt`, `/continue`,
`/unmute`, `/context` now all work natively. Each verb is a real IRC
command on the wire (HANDOFF, SAVE, CONTINUE, HALT, MUTE, UNMUTE,
CONTEXT) routed through dispatch/2. No wire surprises — clients send
the verb verbatim, server handles it.

Channel resolution: each verb takes an optional `#channel` first arg.
Without one, defaults to the user's only joined channel; if they're
in multiple, returns a NOTICE asking for the channel explicitly. So
`/save` Just Works™ in the common case.

Agent resolution: MUTE/UNMUTE/HANDOFF resolve the nick argument by
looking up the room's roster (since IRC nicks drop the `agents/`
namespace prefix). Unknown nick → 401 ERR_NOSUCHNICK.

HANDOFF runs an LLM call and can take seconds; spawned in a Task so
the connection stays responsive. Completion reported as a NOTICE.

Synthesized TOPIC. On JOIN we emit 332 RPL_TOPIC + 333 RPL_TOPICWHOTIME
with a tiny "N agents" line — visible in the channel header in most
clients. Re-emitted as a `TOPIC` line whenever an agent joins or
leaves the room. Order of welcome burst is now JOIN → TOPIC → NAMES
(common server convention; keeps 366 as the final marker).

CONTEXT renders Claude Code-style context-window snapshot per agent:
  Context windows:
    cassowary  ▓▓░░░░░░░░░░░░░░  23%  (45,234 / 200,000)
    fonz       ▓▓▓▓▓▓░░░░░░░░░░  41%  (82,001 / 200,000)
NOTICE-delivered (one line per agent), padded for column alignment.

11 new tests. Numerics: 331/332/333.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round out the IRC verb set so the server feels like a real network.

KICK and INVITE map directly to Room.leave/2 and Room.join/2. No
channel-op gating (no +o flag, no 482 ERR_CHANOPRIVSNEEDED) — Egghead
rooms are flat and any participant shapes the roster, paralleling
the TUI. KICK reaches the room's roster via resolve_agent_in_room/2;
INVITE walks the global agent registry via resolve_agent_anywhere/1
so you can summon any defined agent into any room.

WHOIS surfaces real metadata. For an agent: model + display name in
the realname field (311), then 320 RPL_WHOISSPECIAL lines for context
window utilization, disposition, and capabilities, plus a 319 with
all rooms the agent is currently in. For a connected human nick:
basic identity, no extras yet (M4 will add joined channels via the
IRC.Registry walk).

MOTD, VERSION, TIME — five-minute cosmetics. MOTD is a small static
greeting pointing at the rest of the verb palette. VERSION returns
351 with the egghead version + a comment line. TIME returns the
server's UTC clock in ISO-8601.

Egghead.Agent.list_agents/0 needs the record store running, which
isn't always true (test mode, degraded headless). All callers now
go through safe_list_agents/0 so a missing record store reports
"no such nick" instead of crashing the connection.

10 new numerics: 311, 312, 317, 318, 319, 320, 341, 351, 372, 375,
376, 391, 401 (no_such_nick), 442, 443. 9 new tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WHOIS for an agent emitted three RPL_WHOISSPECIAL (320) lines for
context %, disposition, and capabilities. ERC and several other
clients hard-code 320 as "is identified to services" regardless of
trailing text — so all three rendered identically, dropping the data.

Repacked: model + context % into the realname field (311), then
disposition + capabilities into 312 RPL_WHOISSERVER's info field, and
added 335 RPL_WHOISBOT so modern clients visually mark agents as
bots. Removed the 320 spam. The 320 helper stays in numerics for
callers that genuinely want the literal-services semantic; just
documented its surprise rendering.

INVITE crashed the connection when the user typed
`invite cassowary #default` without first having joined via
`#default`. `target_to_room_id/2` only resolved `#default` from the
per-connection alias map, so it fell through to the literal
"default" room id. Subsequent Room.join("default", ...) hit a dead
GenServer name and propagated :no_proc up through the connection.

Now `target_to_room_id/2` resolves `#default` against
`Egghead.default_room/0` as a global fallback. INVITE also calls
`ensure_room/1` defensively (parity with JOIN's auto-create) so a
verb against a brand-new channel name doesn't no-op or crash.

Two regression tests cover both.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…face tags

The earlier "system prompt as NOTICE" report from live use traced to
this: `agent.disposition` is `record.body || ""` (per
lib/egghead/record/agent.ex:76), i.e. the entire multi-paragraph
system prompt — not a short one-line label like the field name
suggested. Packing it into 312 RPL_WHOISSERVER's info trailing made
the client wrap it across many `*** localhost ...` lines, which read
as a separate notice burst. There was no second leak.

Drops disposition from WHOIS entirely. Surfaces `agent.tags` instead
— short, descriptive labels that fit on one line. Capabilities still
included.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PRIVMSG to a nick (not a channel) is now a real direct message.
For an agent nick: spawn a Task that calls Egghead.prompt/3 with
the message body, then send the response back as a PRIVMSG from
the agent's nick to the asker. Async — the connection keeps
processing other commands while the LLM thinks. Multi-line
responses split into one PRIVMSG per line.

For an unknown nick: 401 ERR_NOSUCHNICK.

For a connected human nick: NOTICE that human-to-human DMs are
M4 — needs the per-conn message-forwarding infrastructure that
multi-user identity work brings in.

Replaces the M1 placeholder NOTICE ("DMs to agents are not wired
yet") that was lying to users since b034d71. Three integration
tests cover the unknown / human / agent dispatch paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Symptom: ERC reported "Connection failed! Re-establishing…" with
nothing in the IRC log. Cause: Thousand Island's default 60s
read_timeout. With no inbound bytes for 60s (idle ERC, no client-
initiated PING for whatever reason), TI cleanly closes the socket.
ERC sees TCP close and reconnects. Our handle_close was silent so
the disconnect didn't appear in any IRC-prefixed log line.

Real IRC servers handle this with bidirectional PING/PONG keepalive,
which both keeps the socket warm and detects dead clients. Adding
the server-initiated half:

- After registration, schedule `:keepalive_tick` every 90s.
- On tick: if `awaiting_pong?` is still true from the previous tick,
  the client is dead — log it and stop the connection. Otherwise,
  send a fresh PING with the server name as token, set the flag,
  re-arm.
- Inbound PONG (from any prior PING) clears `awaiting_pong?`.

Also added an info log in `handle_close` so future disconnects are
visible at a glance — easier than grep'ing Thousand Island's own
module-prefixed lines.

Test for the PONG-clears-flag path; the timer-driven side is too
slow to unit-test (90s interval) and is exercised in live use.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Capability denials (`{:agent_tool_denied, room_id, agent_id, tool,
input, denial}` from `Egghead.Agent.Session.broadcast_denial/4`)
were dropped silently by the IRC layer. Now they render as a CTCP
ACTION line in the channel, parallel to the tool-call action:

  * scout uses net_get url=https://api.example.com/...
  * scout was denied net_get: no grant for net.get on api.example.com

Just the human-readable `denial.message` — the structured
request/grants payload stays in the operator log. Falls back to
"denied" if the broadcast comes through with a nil denial.

Two tests cover the populated and nil cases.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tone refs

IRCv3 `server-time` capability — clients negotiate via CAP REQ
:server-time, server tags every outbound chat-shaped line (PRIVMSG,
NOTICE, CTCP ACTION) with `@time=ISO-8601`. Clients render the message
at that timestamp instead of "now," which is what makes scrollback
replay meaningful.

On JOIN, if the client negotiated server-time, the last 50 transcript
messages from `Room.get_transcript/1` get replayed as backdated
PRIVMSGs — each tagged with its original timestamp. The IRC client
slots them into scrollback at the correct historical moment instead
of at the current time. `/pass` markers are skipped (transcript
convention, not scrollback content). Without server-time, no replay
fires (avoiding a confusing burst of duplicate-looking messages).

CAP negotiation now advertises real capabilities in CAP LS, ACKs
supported requests, and NAKs the whole batch atomically when any
requested cap is unsupported (per IRCv3 spec).

Sweep: removed M1/M2/M3/M3.5/M3.6/M4 milestone references from
docstrings and inline comments throughout `lib/egghead/irc/` and
`test/egghead/irc/`. Comments now describe what the code does, not
when it was added. Renamed test files for the same reason:

  m2_actions_test  →  action_events_test
  m3_verbs_test    →  slash_verbs_test
  m3_5_extras_test →  ops_commands_test
  m3_6_dm_test     →  dm_test
  m4_server_time_test → server_time_test

Module names updated to match. Milestones survive only in the
`design/irc-shelved` record, where they're appropriate context.

108 IRC tests; 8 new for CAP negotiation, time tagging, and
scrollback replay.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…mplification

IRCv3 chathistory extension. Five subcommands for fetching arbitrary
windows of room history:

  CHATHISTORY LATEST  <target> *                       <limit>
  CHATHISTORY BEFORE  <target> timestamp=<iso>         <limit>
  CHATHISTORY AFTER   <target> timestamp=<iso>         <limit>
  CHATHISTORY AROUND  <target> timestamp=<iso>         <limit>
  CHATHISTORY BETWEEN <target> timestamp=<iso> timestamp=<iso> <limit>

Each response is wrapped in a `BATCH +id chathistory <target>` …
`BATCH -id` envelope so clients distinguish history from live traffic.
Every replayed PRIVMSG carries `@time=` (original timestamp) and
`@batch=id` tags. Limit clamped at 100 (advertised in ISUPPORT 005
as `CHATHISTORY=100`). Errors surface as IRCv3 standard-replies
`FAIL CHATHISTORY <code> :<desc>` lines (NEED_MORE_PARAMS,
INVALID_PARAMS, INVALID_TARGET, UNKNOWN_COMMAND).

Two new IRCv3 caps in CAP LS: `batch` (envelope) and `chathistory`
(verb). `server-time` was already there.

Also in this commit:

- Simplified the inbound-PING response. Was emitting
  `:server PONG server :token` (server name in both middle params
  and trailing). Some clients (ERC included) compare the trailing
  token to what they sent; the redundant middle param confused the
  match in some configurations. Now `:server PONG :token`.

- Added Logger.info on outbound PINGs (server keepalive) and
  inbound PONGs so the keepalive cycle is visible in the log when
  diagnosing dropouts. Tail with:
    tail -f ~/.local/state/egghead/egghead.log | grep IRC

- Tail of the milestone-comment sweep — the test files renamed in
  e03b187 had M-prefixed module names and docstrings still in place
  (only the file paths were renamed). Now updated to match.

10 new chathistory tests; 118 IRC tests total.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…gging

Two issues from live use.

History replay was silently no-op'ing for any client that didn't
negotiate the IRCv3 `server-time` capability — which includes ERC's
default config. The thinking was "without timestamps the messages
would render at 'now' and look like a duplicate flood," but that
silently drops the entire chat history visible to TUI users from
the IRC client's view. Now: always replay on JOIN. With server-time,
each line carries an `@time` tag and lands at the right historical
moment; without it, the lines render at the current timestamp as a
recap. Recap > nothing.

Lifecycle logging so disconnect / reconnect cycles are visible:

  IRC: connection opened from <ip>:<port>
  IRC: registered nick=<n> caps=[...]  (history-replay-on-join with @time tags|with current timestamps)
  IRC: -> PING (<n>)               ← already there
  IRC: <- PONG (<n>)               ← already there
  IRC: connection closed (nick=<n>)
  IRC: connection error (nick=<n>, reason=<term>)   ← new (handle_error callback)

The error path catches socket-level failures (RST, EPIPE, etc.) that
handle_close doesn't get called for — useful for diagnosing client
disconnects that aren't clean QUITs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…connect bug

Live ERC was disconnecting every ~60-66 seconds with no log lines
explaining why. Server-side: connection opens, registers with
`caps=[]` (ERC doesn't negotiate IRCv3 caps by default), then 60s
later a fresh connection opens for the same nick. No PING/PONG ever
fires (our 90s tick), no `connection closed` line, no
`connection error` line.

Root cause: Thousand Island has a default `read_timeout: 60_000` —
if no inbound bytes arrive on the socket in that window, it kills
the GenServer with `{:stop, {:shutdown, :timeout}, ...}` from its
default `:timeout` info handler. That bypasses our `handle_close`
callback entirely (no logs), and it always fires before our 90s
keepalive tick ever runs.

Set `read_timeout: :infinity` on the listener. Our PING/PONG
keepalive (`:keepalive_tick` at 90s) is the proper dead-connection
detector — it both keeps the socket warm AND logs every PING/PONG
plus drops with a clear "no PONG within Nms" message when a client
genuinely stops responding. The TI read_timeout was redundant at
best and silently destructive in practice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walks back the unconditional replay from df87254. Without server-time,
the lines render at the current timestamp — actively misleading for
content that's actually old (a transcript line from yesterday looks
like a fresh message arriving "now"). Better to show nothing on JOIN
than to fake the timing.

Clients that want history without server-time can use the CHATHISTORY
verb (gated on its own cap) for explicit on-demand fetches; clients
that have neither cap don't get history on JOIN. The capability
contract becomes honest: opt into the IRCv3 features, get the IRCv3
features.

Registration log line clarified to say "history-replay-on-join: yes"
or "no — needs server-time cap" so the gate is visible.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bumped to debug level (silent at default :info) so a steady stream
of keepalive traffic doesn't drown the log. When debug is enabled
the lines now carry per-PING tokens and round-trip latency:

  IRC: -> PING m token=Hk2QRz
  IRC: <- PONG m token=Hk2QRz rtt=12ms

Server-initiated PINGs use a fresh `:crypto.strong_rand_bytes/1`
token per request, stored alongside the send timestamp in connection
state. Inbound PONG matches against that token and reports the
round-trip in milliseconds; mismatched / unsolicited PONGs note the
discrepancy too. Inbound PING from the client logs both the received
token and the response we send back — useful when ERC's own
`erc-server-send-ping-interval` is what's keeping the socket alive.

State field rename: `awaiting_pong?` (boolean) → `awaiting_pong_token`
(nil | binary) + `last_ping_sent_at` for the latency math. The
`dropping nick — no PONG` line now also names the token and the
exact wait time.

Connection-level events (open / register / close / error / drop)
stay at info — those are low-volume and useful by default.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
connection.ex was carrying 2331 lines of mixed concerns. Pull out the
cleanly-isolable pieces:

- StreamBuffer — pure per-(room, agent) paragraph buffer for streaming
  agent output (absorb / take_tail / drop_room).
- Format — tool_input / context_bar / int rendering helpers.
- Channels — per-connection #default-style alias resolution
  (resolve_alias / display_channel / target_to_room_id).
- ChatHistory — the whole IRCv3 CHATHISTORY subprotocol
  (LATEST/BEFORE/AFTER/AROUND/BETWEEN + BATCH wrapping + timestamp
  parsing). Connection delegates with a 9-line context bundle so the
  module stays ignorant of sockets and cap negotiation.

connection.ex shrinks 2331 → 1959 lines. mix test green (1206 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second pass on connection.ex. Five more focused modules pulled out:

- Forwarder — promotes the per-room PubSub forwarder Task into a
  named module (start_link/2, stop/1).
- Agents — shared safe-list / find-by-nick / channels / find-in-room
  helpers used across DM, INVITE, KICK, /context, and Whois.
- Whois — full WHOIS handling (~110 lines: agent + human replies,
  cap/tag formatting, RPL_WHOISBOT). Connection delegates with a
  3-line context bundle.
- Wire — low-level emission helpers (write, send_message, send_privmsg,
  send_action, send_notice, time_tag, prefix, agent_prefix). Per-conn
  shorthands in connection.ex stay terse to keep the 100+ call sites
  unchanged.
- Verbs — the entire Egghead slash-command palette (SAVE/CONTINUE/
  HALT/MUTE/UNMUTE/HANDOFF/CONTEXT) plus their resolution helpers
  (with_room, with_room_and_agent, resolve_room_arg) and the
  /context render. Connection's dispatch case collapses 7 branches
  to one.

connection.ex shrinks 1959 → 1518 lines (down from 2331 before round 1
— a 35% reduction overall). mix test green (1206 tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Walks through connecting (ERC, irssi, weechat, nc), the
channels-as-rooms / nicks-as-agents projection, the four
addressing modes, the slash-verb-to-IRC-verb mapping
(SAVE/CONTINUE/HALT/MUTE/UNMUTE/HANDOFF/CONTEXT), KICK / INVITE /
WHOIS / CHATHISTORY behavior, the irc: config block plus
per-invocation overrides, the loopback-default security posture,
the IRCv3 caps the server actually negotiates, and the single-user
caveat.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mwunsch mwunsch merged commit 6e09947 into main May 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant