Skip to content

Bh3ky/collectiveQ

Repository files navigation

collectiveQ

collectiveQ is an AI-assisted community insight tool that ingests Discord message history and produces a human-readable report of recurring issues, patterns, and trends. The focus is on helping moderators understand blockers and learning gaps without taking any automated action.

Pipeline Overview

  1. Ingest: Load Discord export JSON, normalize fields, filter bot/system messages.
  2. Preprocess: Clean text (lowercase, remove URLs, tag code blocks).
  3. Embed: Generate semantic embeddings and cache them locally.
  4. Cluster: Group similar messages with HDBSCAN.
  5. Summarize: Generate human-readable summaries for each cluster.
  6. Report: Create a Markdown report of recurring blockers and trends.

Quickstart

uvicorn src.app.main:app --reload
celery -A src.app.celery_app.celery_app worker --loglevel=info

Docker Compose

make dev

Stop services:

make down

Rebuild images (after dependency changes):

docker compose build

This API provides:

  • POST /upload (multipart upload)
  • POST /ingest (full pipeline with upload_id)
  • POST /preprocess (start run with upload_id)
  • POST /embed (requires run_id)
  • POST /cluster (requires run_id)
  • POST /insights (summarize clusters for run_id)
  • POST /summary (generate report for run_id)
  • GET /status/{task_id}
  • GET /insights/{task_id}
  • GET /summary/{task_id}
  • GET /insights/run/{run_id}
  • GET /summary/run/{run_id}

Outputs are stored under:

  • data_api/uploads/
  • data_api/runs/{run_id}/

Environment

Copy .env.example to .env and fill in values:

  • OPENAI_API_KEY
  • CELERY_BROKER_URL / CELERY_RESULT_BACKEND
  • CORS_ALLOW_ORIGINS

Providers

Embeddings:

  • OpenAI: provider=openai, model defaults to text-embedding-3-small (requires OPENAI_API_KEY)
  • SentenceTransformers: provider=sentence_transformers, model defaults can be set (e.g., all-MiniLM-L6-v2)

Summaries:

  • Default uses OpenAI via summarize.py
  • You can inject a local summarizer for offline runs

Tests

Run all unit tests with:

python -m unittest discover -s tests

Repo Structure

  • src/: pipeline modules
  • src/app/: FastAPI + Celery app
  • data_api/: API uploads and run outputs
  • docs/: module documentation and sample runs

Notes

  • Pipeline runs via background tasks.
  • No moderation actions are performed; this is insight-only.

About

collectiveQ is an AI-assisted community insight tool that ingests Discord message history and produces a human-readable report of recurring issues, patterns, and trends. Build with Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors