Skip to content

Track initial local Discord maintainer-archive use friction #100

Description

@joshka

Context

I am using Discrawl, alongside Gitcrawl, as a local archive layer to widen my scope of vision as an OSS maintainer. The intended workflow is that Codex can query these local archives as needed, then use live services only when freshness, permissions, or write actions require it.

The parallel Gitcrawl tracking issue is openclaw/gitcrawl#81. This issue is the Discrawl side of the same maintainer-archive workflow, focused on real friction observed while using a local Discord archive through Codex.

This issue tracks observations from initial Discrawl use across bot-backed archive sync, local Discord Desktop cache import, private/public snapshot checks, and stop/status diagnostics. The goal is not to draw firm product conclusions from one setup, but to capture concrete friction that showed up while operating Discrawl as local maintainer context.

Shared crawler/cache coordination across multiple crawler tools is intentionally out of scope for this issue. That may be useful elsewhere, but this issue is focused on direct Discrawl maintainer-archive use.

What Worked

  • A local SQLite archive was useful for asking Discord-history questions without depending on Discord search.
  • discrawl status --json and discrawl doctor were the fastest basic health checks.
  • discrawl sync --source wiretap provided a practical no-bot path for local Discord Desktop cache data.
  • discrawl wiretap --watch-every could run while manually browsing/scrolling Discord Desktop to improve local coverage.
  • discrawl sql was useful for precise read-only inspection when normal query commands were too coarse.
  • The existing search, messages, dms, tui, status --json, doctor --json, sync --source, wiretap, analytics, and publish --public-only surfaces covered a lot of the basic workflow once the right command shapes were known.

Observations

  • Some useful archive questions quickly became coverage questions: how many messages are present, which channels are named vs synthetic, what is earliest/latest per channel, whether history is complete or just locally cached, and whether wiretap skipped anything.
  • During local-only wiretap harvesting, progress needed a separate ad hoc SQLite stats loop while wiretap --watch-every ran. The useful stats were total messages, message channels, channel records, named channels, synthetic channels, top channels, and import deltas.
  • Manual scrolling in Discord Desktop could materially improve local coverage, but the tool did not provide a first-class way to monitor that harvesting progress.
  • Confirming that Discrawl had actually stopped required shell process checks for discrawl, wiretap, and sqlite3, then a read-only pragma quick_check, then discrawl status --json. status confirmed archive state, but not whether anything was still writing.
  • The default discrawl sync source mix can matter. In one workflow, combined bot plus wiretap sync left guild metadata insufficient for public/private publish classification, and a follow-up discrawl sync --source discord repaired the role/visibility metadata.
  • Channel-name ambiguity showed up in real use. A helper using the numeric channel id was more reliable than a bare channel name like help.
  • Sparse database errors made sync failures harder to interpret. The attachment insert path needed row-level context before deciding whether behavior should change.
  • Some useful workflows were already supported but easy to miss from memory alone, especially wiretap --watch-every, read-only discrawl sql, source-specific sync, and the local-only DM boundary.

Related Follow-Up Issues

Non-Goals

  • This is not proposing one large redesign.
  • This is not about shared crawler/cache coordination across multiple crawler tools.
  • This is not trying to make local Discord Desktop cache import equivalent to complete Discord history.
  • This is not asking Discrawl to use user tokens or behave like a selfbot.
  • This is not assuming every archive should publish public snapshots.
  • This is not a replacement for narrower follow-up issues with concrete command/API proposals.

Prepared with Codex, confirmed as accurate by human.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low-risk cleanup, docs, polish, ergonomics, or speculative feature.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:otherThis issue has meaningful maintainer-visible impact outside the owned taxonomy.issue-rating: 🌊 off-meta tidepoolIssue quality rating does not apply to this item.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions