Status: In Development — This project is under active development and not yet production-ready.
Data ingestion pipeline that retrieves signals from the Dataminr First Alert API, runs ML-based classification and clustering, and writes structured results (signals → events → alerts) into the CLEAR system via GraphQL.
Dataminr API → [poll_dataminr] → [process_signal] → CLEAR API (GraphQL)
│
┌───────┴───────┐
│ ML Pipeline │
│ classify → │
│ group → │
│ assess │
└────────────────┘
# Install dependencies
pip install uv
uv pip install --system .
# Copy env and fill in values
cp .env.example .env
# Run with Docker
docker compose up -d
# Or run locally (requires Redis)
celery -A src.celery_app worker --beat --loglevel=infoSee .env.example for all available settings. Key variables:
| Variable | Description |
|---|---|
DATAMINR_CLIENT_ID |
Dataminr API client ID |
DATAMINR_CLIENT_SECRET |
Dataminr API client secret |
CLEAR_API_URL |
CLEAR GraphQL API endpoint |
CLEAR_API_KEY |
Service account API key (sk_live_...) |
ANTHROPIC_API_KEY |
API key for ML inference |
REDIS_URL |
Redis connection URL |
POLL_INTERVAL_SECONDS |
How often to poll Dataminr (default: 15) |
RELEVANCE_THRESHOLD |
Min relevance score for event creation (default: 0.5) |
- Poll — Celery beat triggers
poll_dataminrevery N seconds - Fetch — Gets signals from Dataminr within time window (last synced → now)
- Ingest — Each signal is mapped and saved via
createSignalmutation - Classify — ML classifies: disaster types, relevance, severity
- Group — If relevant, ML clusters into existing or new event
- Escalate — If severity >= 4, assesses for alert creation (always
draft)
src/
├── config.py # Settings from .env
├── celery_app.py # Celery app + beat schedule
├── tasks/
│ ├── poll.py # poll_dataminr task
│ └── process.py # process_signal task
├── clients/
│ ├── dataminr.py # Dataminr API (auth + fetch)
│ ├── graphql.py # CLEAR API mutations/queries
│ └── claude.py # ML inference client
├── models/
│ ├── dataminr.py # Dataminr response schemas
│ └── clear.py # GraphQL inputs + ML output schemas
├── services/
│ ├── signal.py # Signal field mapping + ingestion
│ ├── event.py # Event grouping
│ ├── alert.py # Alert escalation
│ └── geo.py # Location resolution
└── prompts/
├── classify.py # Signal classification
├── group.py # Event grouping
└── assess.py # Alert assessment