Collaborative data review for AI datasets — voting, threaded discussion, and PCA embedding maps over LLM-labeled examples and ML profiling reports.
ox-collab is the social and visualization layer in the oasis-data product line. It wraps two upstream tools — argilla (LLM dataset labeling) and ydata-profiling (tabular ML profiling) — with a unified UI that lets a research team upvote, discuss, and explore datasets together.
Status: WIP. Not yet production-ready. Local Docker stack works; LAN deployment is the next milestone.
ox-collab/
├── api/ FastAPI backend — votes, comments, mentions, embeddings, auth toggle
├── frontend/ React + Mantine + Plotly UI — record cards, threads, 1D/2D/3D/4D viewer
├── docs/ Architecture, deployment, adapter notes
├── docker-compose.yaml full local stack (postgres + api + frontend)
├── docker-compose.lan.yaml overlay to expose on the LAN
└── .swarm/ cross-cutting project coordination (dot_swarm protocol)
Each subdir is independently developable (api/.swarm/ and frontend/.swarm/ track per-component work; the top-level .swarm/ tracks cross-cutting items like releases and integration with the forks).
ox-collab does not store dataset records itself — it pulls them from sibling repos in the oasis-data family:
| Repo | Role | Upstream |
|---|---|---|
| oasis-main/ox-collab | This repo. Collaboration UI + API. | new |
| oasis-main/ox-llm-data-collab | LLM dataset labeling | argilla-io/argilla |
| oasis-main/ox-ml-data-collab | Tabular ML profiling | ydata-profiling family |
The two fork repos each contain an oasis-extensions/ox_collab_adapter.py that pushes their records into ox-collab-api for collaboration. See docs/ADAPTERS.md for details.
- Records browser — filter by source (LLM data / ML profiling / all), score-sorted, paginated.
- Voting — upvote / neutral / downvote. One vote per record per user; aggregate score on every card.
- Threaded comments — markdown body, infinite reply depth,
@mentionsparsed server-side and stored as notifications. - Embedding map — PCA projection of every record into:
- 1D strip plot (density)
- 2D scatter (color by source)
- 3D rotatable scatter
- 4D = 3D + scrubbable time slider with play/pause, for watching the manifold evolve as records are added or relabeled.
- Optional auth — host sets
OX_AUTH_REQUIRED=true|false. False = anonymous-ok mode (X-Anon-Name header). True = JWT required for writes; reads stay open. - LAN-friendly —
docker-compose.lan.yamloverlay binds to0.0.0.0and advertises via mDNS so colleagues on the same network reachhttp://<host>.local:8002without configuration.
git clone https://github.com/oasis-main/ox-collab.git
cd ox-collab
# Bring up the full stack (postgres + api + frontend)
docker compose --profile full up
# Open http://localhost:8002 — API at http://localhost:8001To expose on the LAN:
docker compose -f docker-compose.yaml -f docker-compose.lan.yaml --profile full --profile lan upFor local dev without Docker, see api/README.md and frontend/README.md.
┌─────────────────────────────────────────────────────────────────┐
│ React + Mantine + Plotly (port 8002, served by nginx) │
│ routes: /, /records/:id, /embedding │
└──────────────────────────────┬──────────────────────────────────┘
│ /api/* reverse-proxy
┌──────────────────────────────▼──────────────────────────────────┐
│ FastAPI (port 8001) │
│ /records /votes /comments /embeddings /auth │
│ + auth-toggle middleware │
│ + lazy sentence-transformers worker for embeddings │
└──────────────────────────────┬──────────────────────────────────┘
│
┌──────────────┴──────────────┐
▼ ▼
Postgres 16 (records, sentence-transformers
votes, comments, all-MiniLM-L6-v2
embeddings, mentions) (loaded on first /embeddings/refresh)
▲ ▲
│ │
┌───────┴────────┐ ┌──────────┴────────┐
│ ox-llm-data- │ │ ox-ml-data- │
│ collab adapter │ │ collab adapter │
│ (argilla API) │ │ (profiling JSON) │
└────────────────┘ └───────────────────┘
Full schema and design tradeoffs in docs/ARCHITECTURE.md.
- v0 scaffold — api + frontend + Docker + adapters
- OXC-001: end-to-end smoke test on a clean machine
- OXC-002: deploy on LAN with mDNS, validate multi-user collaboration
- OXC-003: oasis-auth SSO integration
- OXC-004: WebSocket live updates (votes/comments)
- OXC-005: ModalSheaf integration for cross-source consistency scoring
- OXC-006: oasis-cloud production deployment manifests
See .swarm/queue.md for the full coordination queue.
ox-collab (this repo's original code) is licensed under Apache-2.0.
The sibling fork repos (ox-llm-data-collab, ox-ml-data-collab) inherit the upstream licenses (Apache-2.0 and MIT respectively); see each fork's LICENSE and NOTICE files for attribution.
Part of oasis-data — collaborative tooling for AI dataset development.