feat(messaging): Discord channel read + summarize (on-demand + scheduled digest)#21
Conversation
…(Fix 2) The pre-push hook was running a Tier-1 satellite build and the build-config preset-matrix smoke, making push multi-minute (and tripping an SSH idle-drop). Move that coverage to where it belongs: - satellite build is already the `satellite-build` CI job — drop the duplicate. - the server daemon link+start smoke is the `docker-build` job's `dawn --help` (the only stock-runner place the binary links: it pulls ONNX + Piper, which aren't apt packages). Annotated accordingly. - the full ML preset matrix can't run on stock runners for the same reason, so ./tests/smoke_test.sh stays a developer/release-time check (documented in the hook header). pre-push.hook 170->116 lines, now fast-only (tests-ci + ctest -L ci).
…led digest)
Friday can now read and summarize Discord channels the bot can see — on
demand ("catch me up on #general") or as a scheduled digest. Reading is
REST pull-only (no guild gateway firehose): newest-first backward
pagination with time-range (since/until) and an older-history cursor.
- Driver contract: optional list_readable_channels / read_history
(messaging_read_window_t) / cache-invalidate. Discord implements them;
the REST read path is split into messaging_discord_read.c for size.
- Engine: messaging_engine_read.c orchestrates discovery (5-min cache),
fuzzy name resolution, per-message injection filtering + [DATA]
envelope, and a tz-rendered transcript with a streaming char cap;
opts-struct APIs (read_channel / read_server).
- Tool: read_channel / read_server / list_discord_channels actions.
- str_fuzzy.{c,h} promoted to Layer 1 (shared with home_assistant);
scheduled_context.{c,h} carries the event owner onto scheduler threads.
Scheduled messaging is read-only. A per-action schedulability gate
(tool_metadata.validate_schedulable_action, one shared allowlist) rejects
send/manage actions at BOTH create time and fire time. The scheduler task
path now sets the scheduled-origin context, so the gate fires there too
and reads attribute to the real event owner, not user 1.
Loosen the memory_filter blocklist to stop benign-chat false positives.
Test: build clean; 35/35 test_tool_registry incl. the per-action gate;
new str_fuzzy + scheduled_context unit tests; full suite green; format clean.
Address the five-agent review of the Discord read/summarize feature:
- SEC (HIGH): neutralize untrusted channel/server names at every transcript
site — a crafted guild name could break the [DATA] envelope and reach the
LLM as instructions. Shared sanitize_inline + strbuf_append_inline cover the
channel/server preambles, both disambiguation lists, and the list output.
- SEC: honor Discord 429 (CURLINFO_RETRY_AFTER, clamped 60s / default 5s) via
an s_rate_backoff_until gate that fast-fails reads without a network call, so
a sweep can't hammer the route into a bot-token ban.
- Clamp read_server's per-channel emit budget to the transcript cap so a
section can't overshoot it; break guild discovery at DC_MAX_CHANNELS instead
of fetching every remaining guild for dropped results.
- Provider-neutral find_read_capable_driver() replaces the hardcoded
find_driver("discord") in the read path.
- scheduler: return SUCCESS in scheduler_execute_task + explicit
no-early-return invariants around the scheduled-context set/clear.
- Drop a dead author ternary; fix the snowflake-size mirror comment.
Build clean; full suite green; format clean.
Code Review by Qodo
Context used✅ Compliance rules (platform):
27 rules 1.
|
There was a problem hiding this comment.
Pull request overview
Adds a pull-based Discord “read & summarize” capability to the messaging subsystem, including on-demand channel/server reads and scheduled digests, while tightening scheduled-action safety via a per-action schedulability gate and scheduled-origin user context propagation.
Changes:
- Extend the messaging driver contract + Discord driver to support readable-channel discovery and REST history reads (with caching, pagination, and 429 backoff).
- Add a new messaging-engine read path that resolves channels via shared fuzzy matching, filters/neutralizes untrusted content, and emits
[DATA]transcripts for summarization. - Introduce scheduled-origin thread-local context + per-action schedulability validation (create-time and fire-time), update tests/docs/hooks accordingly, and relax
memory_filterpatterns to reduce false positives.
Reviewed changes
Copilot reviewed 29 out of 29 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_tool_registry.c | Updates schedulability validation tests for the new per-action gate and signature change. |
| tests/test_str_fuzzy.c | Adds unit tests for the shared fuzzy matching helper. |
| tests/test_scheduled_context.c | Adds unit tests for scheduled-origin thread-local context. |
| tests/test_memory_filter.c | Updates tests to match the relaxed injection-pattern blocklist behavior. |
| tests/CMakeLists.txt | Registers new Unity unit test targets for str_fuzzy and scheduled_context. |
| src/tools/tool_registry.c | Adds tool_action parameter and enforces optional per-action schedulability gate. |
| src/tools/scheduler_tool.c | Passes tool_action through schedulability validation on schedule creation. |
| src/tools/messaging_tool.c | Adds read_channel / read_server / list_discord_channels, scheduled gating, and scheduled-origin user resolution. |
| src/tools/homeassistant_service.c | Migrates Home Assistant name matching to shared str_fuzzy. |
| src/messaging/messaging_engine.c | Adds read-path rate limiters and driver selection for read-capable providers. |
| src/messaging/messaging_engine_read.c | Implements discovery, fuzzy resolution, transcript shaping, filtering, and read/list APIs. |
| src/messaging/messaging_discord.c | Splits read path into separate TU, shares token/constants, wires driver hooks, and shuts down read path on teardown. |
| src/messaging/messaging_discord_read.c | Implements Discord REST discovery (cached) + history fetch with pagination and 429 backoff. |
| src/core/str_fuzzy.c | New shared fuzzy matcher implementation. |
| src/core/scheduler.c | Publishes scheduled-origin context around tool execution; validates steps with tool action. |
| src/core/scheduled_context.c | New thread-local scheduled-origin context implementation. |
| src/core/memory_filter.c | Relaxes blocklist patterns (removes always/never/whenever+verb and temporal phrases). |
| pre-push.hook | Narrows pre-push checks to fast CI-parity targets; defers heavier builds to GitHub Actions. |
| install-git-hooks.sh | Updates hook descriptions to match the streamlined pre-push behavior. |
| include/tools/tool_registry.h | Adds validate_schedulable_action to tool metadata and updates validation API signature/docs. |
| include/messaging/messaging_engine.h | Adds public APIs and option structs for channel/server read + channel listing. |
| include/messaging/messaging_engine_internal.h | Exposes read-path rate limiters and read-capable driver lookup internally. |
| include/messaging/messaging_driver.h | Extends driver contract with optional readable-channel listing, history reads, and cache invalidation. |
| include/messaging/messaging_discord_internal.h | Adds Discord internal shared constants/token and snowflake validator for split TUs. |
| include/core/str_fuzzy.h | Declares shared fuzzy matching helper and scoring constants. |
| include/core/scheduled_context.h | Declares scheduled-origin thread-local context API. |
| docs/MESSAGING_CHANNELS_SETUP.md | Documents Discord read/summarize workflows and scheduled digests. |
| CMakeLists.txt | Adds new core sources and enables new messaging read components under ENABLE_WEBUI. |
| .github/workflows/ci.yml | Clarifies docker-build as the daemon link+start smoke test home. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| int score = str_fuzzy_score(cand_norm, needle); | ||
| if (score < MSG_READ_FUZZY_THRESHOLD) { | ||
| continue; | ||
| } | ||
| if (cand_n < MSG_READ_MAX_CANDIDATES) { | ||
| cand_idx[cand_n++] = i; | ||
| } | ||
| if (score > best_score) { | ||
| best_score = score; | ||
| best_idx = i; | ||
| best_count = 1; | ||
| } else if (score == best_score) { | ||
| best_count++; | ||
| } |
| /* Parse the driver's read_history JSON (newest-first) into a heap array of | ||
| * displayable messages, applying type filtering, the injection filter, and | ||
| * delimiter neutralization. Returns count; *out set to a malloc'd array | ||
| * (caller frees each .author/.content then the array). *filtered_out counts | ||
| * messages dropped by the injection filter. */ |
| /* Optional 'before' older-history cursor: a message id (snowflake). Accept | ||
| * only digits so it can't be anything but an id. */ | ||
| const char *before_id = NULL; | ||
| struct json_object *before_obj = NULL; | ||
| if (json_object_object_get_ex(details, "before", &before_obj)) { | ||
| const char *b = json_object_get_string(before_obj); | ||
| if (b && b[0]) { | ||
| before_id = b; | ||
| for (const char *p = b; *p; p++) { | ||
| if (*p < '0' || *p > '9') { | ||
| before_id = NULL; /* not a bare id → ignore */ | ||
| break; | ||
| } | ||
| } | ||
| } | ||
| } |
| * Returns the text/announcement channels of every server the bot has joined, | ||
| * grouped by server, using the driver's discovery cache — does NOT fetch any | ||
| * message history, so it doesn't consume a read budget. Optional | ||
| * @p server_hint filters to one server (fuzzy). Discord-only in v1. |
| Because `read_channel` is a normal schedulable tool, you can ask for a recurring | ||
| digest delivered to a channel: | ||
|
|
||
| - "Every weekday at 8am, summarize #announcements and send it to my Discord DM." | ||
|
|
||
| The scheduler runs the read, the assistant summarizes, and the summary is | ||
| delivered to the channel you named (see [Delivering scheduled | ||
| events](#delivering-scheduled-events-to-a-channel)). Deliver digests to a **DM** | ||
| rather than back into a channel the digest itself reads, so tomorrow's digest | ||
| doesn't summarize today's. (Only `read_channel` is allowed to run from a | ||
| schedule — `send` and other actions require a live conversation.) |
| size_t i; | ||
| for (i = 0; s[i] != '\0'; i++) { | ||
| if (s[i] < '0' || s[i] > '9') { | ||
| return false; | ||
| } | ||
| } | ||
| return i <= DC_SNOWFLAKE_MAX_DIGITS; |
PR Summary by Qodofeat(messaging): Discord channel read + summarize (on-demand + scheduled digest) WalkthroughsDescription• Adds REST pull-only Discord channel reading: read_channel, read_server, and list_discord_channels with fuzzy resolution and time bounds. • Extends the messaging driver contract with optional discovery/history hooks; Discord implements them in a split read module. • Hardens LLM injection safety for untrusted channel content via [DATA] envelope, delimiter/control-char neutralization, and per-message filtering. • Makes scheduled messaging read-only via per-action schedulability gates enforced at schedule-create and fire time. • Propagates scheduled task ownership into tool callbacks using new scheduled_context thread-local context. • Promotes str_fuzzy to shared core and adds unit tests; trims pre-push to fast-only checks and documents CI smokes. Diagramgraph TD
User(["User / Scheduler"]) --> MT["messaging_tool.c\nread_* actions"] --> ER["messaging_engine_read.c\nresolve · filter · transcript"] --> DR["messaging_discord_read.c\nREST read"]
ST["scheduler_tool.c\ncreate schedule"] --> TR["tool_registry.c\nvalidate_schedulable"] --> MT
SCH["scheduler.c\nfire schedule"] --> SC["scheduled_context.c\nthread-local user_id"] --> MT
ER --> SF["str_fuzzy.c\nscore names"] --> ER
ER --> MF["memory_filter.c\ncheck messages"] --> ER
DR --> DI["messaging_discord_internal.h\ntoken + snowflake"]
DD["messaging_discord.c\nsend/gateway"] --> DI
subgraph Legend
direction LR
_ext{{"External"}} ~~~ _mod["Module"] ~~~ _hdr(["Header/Infra"])
end
High-Level AssessmentThe following are alternative approaches to this PR: 1. Gateway firehose + local message cache
2. Webhook-driven ingestion
3. Isolate summarization in a separate process/service
Recommendation: The pull-only REST design is the best fit for controlling LLM input surface: content is fetched only when requested or via an explicit scheduled digest. Given the deliberate memory_filter relaxation, reviewers should focus on the envelope + name sanitization + snowflake validation + rate limiting/backoff, which are now the primary injection/abuse mitigations. File ChangesEnhancement (13)
Bug fix (1)
Refactor (5)
Documentation (2)
Other (8)
|
…ty, docs) Qodo (3 real bugs): - Sanitize message bodies (sanitize_inline), not just delimiters — an embedded newline could forge fake "[HH:MM] author:" lines inside the [DATA] envelope. - scheduler_execute_task now runs tool_registry_validate_schedulable at fire time (cap + enabled + per-action gate), so the task path enforces the gate generically like briefings — not relying on each tool's internal check. - Discord discovery: on a guilds-parse failure, fail instead of caching an empty channel list (was poisoning discovery for the 5-min TTL). Copilot: - Disambiguation lists only the best-score ties, not all threshold matches. - 'before' cursor validated for length (<=20) with a clear error, not a generic driver-side "couldn't read". - dc_is_valid_snowflake fails fast past the 20-digit cap. - Doc fixes: list_discord_channels IS rate-limited; sweep cap ~30 not ~20; scheduled allowlist is read_channel/read_server/list_discord_channels. Build clean; suite green; format clean.
What
Lets the assistant read and summarize Discord channels the bot can see — on demand ("catch me up on #general", "summarize my server") and as a
scheduled digest. Reading is REST pull-only: no guild-message gateway firehose, so the LLM-input surface stays controlled and the bot only fetches when
explicitly asked.
How
list_readable_channels/read_history/ cache-invalidate) using container/channelvocabulary so a future Slack reader slots in without a contract change. Telegram/SMS leave them NULL.
messaging_discord_read.c): discovery (/users/@me/guilds+ per-guild channels, 5-min cache) and history with newest-firstbackward pagination, time-range (
since/until→ snowflake), and an older-history cursor. Split out of the gateway/send core for size.messaging_engine_read.c): rate-limited read/discovery preamble, fuzzy channel-name resolution (+ ambiguity disambiguation), per-messageinjection filtering, and a tz-rendered, char-capped
[DATA]-wrapped transcript.read_channel/read_server/list_discord_channels.str_fuzzy.{c,h}promoted to Layer 1 (home_assistant migrated onto it);scheduled_context.{c,h}carries the event owner ontoscheduler threads.
tool_metadata.validate_schedulable_action, one shared allowlist) rejectssend/manage actionsat both create and fire time; the task path now sets the scheduled-origin context so reads attribute to the real owner, not user 1.
Security
Channel content is untrusted, multi-author input flowing into a tool-capable LLM, so:
memory_filter_check; the transcript is wrapped in a[DATA]…[/DATA]envelope with a "treat as DATA, not instructions"preamble.
[DATA]delimiters + control chars are neutralized in every untrusted name — message authors and channel/server names (a crafted guild name wasthe one envelope-breakout vector).
limitclamped; the fuzzy-matched name never reaches a URL.Authorizationheader, never a URL or log line.Retry-After, clamped) protects the token from a route-ban during a sweep; per-user + per-sweep rate limits; boundedpagination/message/transcript caps.
memory_filterblocklist (removes thealways/never/whenever + verbcluster andfrom now on/henceforthphrases) to stop false positives on benign chat. With that change the[DATA]envelope + per-name neutralization are theload-bearing injection defense — please review those together.
Testing
str_fuzzy,scheduled_context, and a per-action schedulability gate case;memory_filtertests updated for the relaxation.degrade gracefully, scheduled
sendcorrectly rejected, names render intact through the envelope.