Skip to content

Add archive coverage and wiretap progress reporting #101

Description

@joshka

Parent: #100

Related Gitcrawl context: openclaw/gitcrawl#81

Observed Workflow

I am using Discrawl and Gitcrawl as local archives to widen maintainer visibility across OSS work, with Codex querying the archives as needed.

During initial Discrawl use, local-only Discord Desktop cache import was useful, especially where bot access was unavailable. Running discrawl wiretap --watch-every while manually browsing/scrolling Discord Desktop could harvest more messages and channel metadata over time.

The hard part was answering: "what is actually covered?"

Current Workaround

I had to use ad hoc SQLite queries and a separate stats loop to inspect coverage and harvesting progress.

Useful observed stats included:

  • total messages
  • message-capable channels
  • total channel records
  • named channels
  • synthetic channels
  • top channels by message count
  • earliest/latest cached messages
  • skipped_messages and skipped_channels
  • growth between import passes

This worked, but it is not something a user or Codex agent should have to reconstruct manually.

Request

Add first-class archive coverage and wiretap progress reporting.

Example shape:

discrawl coverage --json
discrawl coverage --guild 968932220549103686 --json
discrawl wiretap --watch-every 10s --stats --json

Useful Output

  • guild id and name
  • message count
  • channel count
  • named vs synthetic channel count
  • earliest and latest message timestamps
  • per-channel message counts
  • per-channel earliest/latest timestamps
  • history-complete marker when known
  • wiretap skipped messages/channels
  • last bot sync time
  • last wiretap import time
  • deltas since the previous sample, when running in watch mode

Why This Matters

Coverage is the first decision point before relying on a local archive. It tells a maintainer or Codex agent whether the archive is ready to answer a question locally, whether a channel needs more sync work, or whether local Discord Desktop cache data is only partial.

Acceptance Criteria

  • A JSON mode suitable for agent/tool use.
  • A human-readable table mode.
  • Works across all guilds in the DB by default.
  • Can filter to a specific guild.
  • Can surface named vs synthetic channel counts without custom SQL.
  • Can surface wiretap skipped-message counts without reading raw import logs.

Prepared with Codex, confirmed as accurate by human.

Metadata

Metadata

Assignees

No one assigned

    Labels

    P3Low-risk cleanup, docs, polish, ergonomics, or speculative feature.clawsweeper:needs-maintainer-reviewClawSweeper marked this issue as needing maintainer review before automation.clawsweeper:needs-product-decisionClawSweeper marked this issue as needing a product or behavior decision.clawsweeper:no-new-fix-prClawSweeper does not recommend queueing a new automated fix PR for this issue.impact:otherThis issue has meaningful maintainer-visible impact outside the owned taxonomy.issue-rating: 🌊 off-meta tidepoolIssue quality rating does not apply to this item.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions