News discovery, review, and public-release automation for the TFHT Enforcement Index
(denbust), a public-interest data project supporting the Task Force on Human Trafficking
and Prostitution.
Created by Shay Palachy Affek.
The Enforcement Index monitors Israeli news and source material for enforcement activity related to
human trafficking, prostitution, brothels, pimping, exploitation, and adjacent legal action. The
repository contains the denbust Python package, operator configs, GitHub Actions workflows,
review tools, and release tooling used to turn incoming source material into structured,
review-gated news_items data.
The public-facing name is TFHT Enforcement Index. The package and CLI remain named denbust for
backward compatibility.
The primary implemented dataset is news_items. It supports:
- daily and weekly source discovery,
- candidate scraping and backfill,
- LLM-assisted relevance and taxonomy classification,
- human review and suppression gates,
- monthly report generation,
- metadata-only public release bundles,
- optional Kaggle, Hugging Face, Google Drive, and S3-compatible publication paths.
Additional dataset ideas such as docs_metadata, open_docs_fulltext, and events are planned
but not the current production surface.
flowchart LR
sources["News sources and search engines"] --> discovery["Discovery and candidate queue"]
discovery --> scrape["Scrape and normalize"]
scrape --> classify["LLM classification and TFHT taxonomy"]
classify --> review["Review, privacy, suppression gates"]
review --> store["Supabase operational store"]
store --> reports["Monthly reports"]
store --> release["Metadata-only public release bundle"]
discovery <--> state["tfht_enforce_idx_state"]
Generated mutable state lives in the companion state repository, DataHackIL/tfht_enforce_idx_state. This repository contains the code, workflows, schemas, and docs.
Install the package for local development:
pip install -e ".[dev]"
python -m playwright install chromiumRun the local news ingest config:
denbust scan --config agents/news/local.yamlRun discovery diagnostics:
denbust diagnose-discovery --config agents/news/local.yamlRun the validation set lint:
denbust validation-lint --validation-set validation/news_items/classifier_validation.csvLive runs require provider credentials and operational configuration. Keep secrets in local environment files or GitHub Actions secrets; do not commit them.
denbust run --dataset news_items --job discover --config agents/news/local.yaml
denbust run --dataset news_items --job backfill_discover --config agents/news/local.yaml
denbust run --dataset news_items --job backfill_scrape --config agents/news/local.yaml
denbust run --dataset news_items --job ingest --config agents/news/local.yaml
denbust report monthly --month 2026-03 --config agents/news/local.yaml
denbust release --dataset news_items --config agents/release/news_items.yaml
denbust backup --dataset news_items --config agents/backup/news_items.yamlLocal runs default to repo-local state under data/. You can override the local state layout with:
DENBUST_STATE_ROOTDENBUST_STORE_PATHDENBUST_RUNS_DIR
GitHub Actions runs check out the companion state repo and write namespaced state under
news_items/<job>/. The important production jobs are:
news_items / discovernews_items / ingestnews_items / backfill_discovernews_items / backfill_scrapenews_items / monthly_reportnews_items / releasenews_items / backup
For the detailed state layout and runbook, see docs/discovery_operations.md.
| Path | Purpose |
|---|---|
src/denbust/ |
Python package and CLI implementation |
agents/ |
Local and GitHub Actions job configs |
.github/workflows/ |
Scheduled and manual operational workflows |
supabase/migrations/ |
Operational database schema |
review_app/ |
Candidate and news-item review workbench |
ingest_app/ |
Manual ingest workbench |
validation/ |
Classifier validation data |
docs/ |
Product definition, operations docs, plans, and evidence reports |
- Product definition
- Discovery operations runbook
- Review workbench reference
- Labeling semantics
- Detailed operational reference
This repository is released under the MIT License.
Created by Shay Palachy Affek [GitHub]