Skip to content

scottjoyner/auto-ingest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Auto-Ingest System

Containerized pipeline that ingests dashcam, audio, and bodycam recordings into a Neo4j knowledge graph. Features distributed workers, a NAS-drop job queue, REST API for triggering jobs, and scheduled cron workers.

Documentation

Document Description
System Design Architecture, service topology, data model, deployment guide
Deployment Runbook File-by-file operational guide
Content OS Architecture Content workflow engine design
Docker Skill Deploy, manage, and monitor Docker services
Job Management Skill Enqueue jobs, manage queue, API endpoints
Troubleshooting Skill Known issues, diagnostics, fixes

Quick Start

cd /home/scott/git/auto-ingest

# Configure environment
cp deploy/path_profiles.env.example .env
# Edit .env for your host

# Build and start all services
docker compose up -d --build

# Verify
docker compose ps
curl http://localhost:8766/api/health
curl http://localhost:8766/api/status

Trigger Jobs

# Via HTTP API
curl -X POST http://localhost:8766/api/enqueue \
  -H 'Content-Type: application/json' \
  -d '{"kind": "dashcam"}'

# Via shell script
./deploy/create_job.sh dashcam
./deploy/create_job.sh audio
./deploy/create_job.sh bodycam
./deploy/create_job.sh all

# Check status
curl http://localhost:8766/api/status

Services

Service Port Poll Purpose
job-api 8766 HTTP API for enqueueing jobs
ingest-service 5 min Runs run_ingest_all.sh continuously
ingest-worker 30s Claims .job files from /nas/drop/
sync-service 10 min Syncs legacy drop from deathstar
content-service 30 min Content OS CLI status loop
ingest-cron 5 min Scheduled ingest cron
content-cron 30 min Scheduled content cron
neo4j 7474/7687 Graph database (20M+ nodes)

Architecture

+-------------------+     +------------------+     +-----------------+
|  Job Trigger API  |     |  Ingest Service  |     |  Sync Service   |
|  (HTTP on :8766)  |     |  (loop 5 min)    |     |  (loop 10 min)   |
+-------------------+     +------------------+     +-----------------+
         |                        |                        |
         v                        v                        v
+-------------------+     +------------------+     +-----------------+
|  .job Queue       |<----|  Ingest Worker   |<----|  Legacy Drop     |
|  /nas/drop/       |     |  (loop 30s)      |     |  /incoming/      |
|   claimed/        |     |                  |     |   deathstar/     |
|   done/           |     |                  |     +-----------------+
|   failed/         |     |                  |
+-------------------+     |                  |
                            |                  |
                            v                  v
                     +------------------+     +-----------------+
                     |  Neo4j Graph DB  |     |  Content OS     |
                     |  :7687 (:7474)   |     |  (cron 30 min)  |
                     +------------------+     +-----------------+

Data Model

Core Neo4j node types:

Label Count Description
PhoneLog 20M Phone call/SMS records
DashcamEmbedding 4.2M Dashcam video embeddings
YOLODetection 4.1M Vehicle/object detections
Frame 3.7M Video frames
Utterance 420K Speech utterances
Segment 361K Transcript segments
Speaker 233K Speaker entities
Transcription 64K Transcription records

Key Files

File Purpose
docker-compose.yml Service definitions
Dockerfile Container image
deploy/job_trigger_api.py HTTP API server
deploy/worker_ingest.sh Distributed worker
deploy/sync_from_legacy_drop.sh Legacy sync
deploy/create_job.sh Job creation helper
deploy/start-cron.sh Cron daemon starter
deploy/cron/ingest.crontab Ingest schedule
deploy/cron/content_generation.crontab Content schedule
deploy/path_profiles.env.example Environment template
run_ingest_all.sh Main ingest runner
ingest_transcriptsv5_3.py Python ingest script

Management

cd /home/scott/git/auto-ingest

# Start/stop
docker compose up -d          # start all
docker compose down            # stop all
docker compose up -d --build  # rebuild and start

# Logs
docker compose logs -f ingest-worker
docker compose logs -f ingest-service
docker compose logs -f sync-service

# Neo4j
docker exec neo4j cypher-shell -u neo4j -p knowledge_graph_2026 "RETURN 1"
docker exec neo4j cypher-shell -u neo4j -p knowledge_graph_2026 \
  "MATCH (n) RETURN labels(n) AS label, count(*) AS cnt ORDER BY cnt DESC LIMIT 10"

# Queue
ls -lah /nas/drop/ /nas/drop/claimed/ /nas/drop/done/ /nas/drop/failed/

Troubleshooting

See Troubleshooting Skill for:

  • Neo4j connection failures
  • libGL/libvpx errors
  • Legacy drop sync issues
  • Job queue problems
  • Cron job failures
  • Diagnostic commands

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors