collectiveQ is an AI-assisted community insight tool that ingests Discord message history and produces a human-readable report of recurring issues, patterns, and trends. The focus is on helping moderators understand blockers and learning gaps without taking any automated action.
- Ingest: Load Discord export JSON, normalize fields, filter bot/system messages.
- Preprocess: Clean text (lowercase, remove URLs, tag code blocks).
- Embed: Generate semantic embeddings and cache them locally.
- Cluster: Group similar messages with HDBSCAN.
- Summarize: Generate human-readable summaries for each cluster.
- Report: Create a Markdown report of recurring blockers and trends.
uvicorn src.app.main:app --reload
celery -A src.app.celery_app.celery_app worker --loglevel=infomake devStop services:
make downRebuild images (after dependency changes):
docker compose buildThis API provides:
POST /upload(multipart upload)POST /ingest(full pipeline withupload_id)POST /preprocess(start run withupload_id)POST /embed(requiresrun_id)POST /cluster(requiresrun_id)POST /insights(summarize clusters forrun_id)POST /summary(generate report forrun_id)GET /status/{task_id}GET /insights/{task_id}GET /summary/{task_id}GET /insights/run/{run_id}GET /summary/run/{run_id}
Outputs are stored under:
data_api/uploads/data_api/runs/{run_id}/
Copy .env.example to .env and fill in values:
OPENAI_API_KEYCELERY_BROKER_URL/CELERY_RESULT_BACKENDCORS_ALLOW_ORIGINS
Embeddings:
- OpenAI:
provider=openai, model defaults totext-embedding-3-small(requiresOPENAI_API_KEY) - SentenceTransformers:
provider=sentence_transformers, model defaults can be set (e.g.,all-MiniLM-L6-v2)
Summaries:
- Default uses OpenAI via
summarize.py - You can inject a local summarizer for offline runs
Run all unit tests with:
python -m unittest discover -s testssrc/: pipeline modulessrc/app/: FastAPI + Celery appdata_api/: API uploads and run outputsdocs/: module documentation and sample runs
- Pipeline runs via background tasks.
- No moderation actions are performed; this is insight-only.