MMMeta is a local-first, production-oriented multimedia metadata generation platform for ingesting large media archives, normalizing discoverable metadata into structured JSON, enriching assets with derivative AI metadata, indexing artifacts for search, and exporting interoperable archives.
- FastAPI API server with OpenAPI docs, health endpoints, metrics, API key auth, optional JWT auth, and WebSocket job progress streaming
- Typer CLI with ingest, process, watch, search, export, validate, providers, pipelines, jobs, embeddings, subtitles, summarize, transcribe, analyze, config, migrate, stats, dedupe, and benchmark commands
- Async SQLAlchemy 2.x persistence with SQLite by default and PostgreSQL support through configuration
- Resumable job queue with persistent state, retries, cancellation, incremental processing, content hashing, and deduplication
- Extensible plugin SDK for custom extractors, providers, exporters, vector stores, pipelines, and enrichment steps
- Search stack with SQLite FTS5 plus vector similarity abstraction
- Local artifact storage with content-addressable layout
- Built-in subtitle parsing, JSON metadata mapping, image metadata extraction, audio heuristics, and video sidecar inspection
- Docker, Compose, Alembic, tests, sample plugin, sample data, and React/Vite/Tailwind UI
python -m venv .venv
.venv\Scripts\activate
pip install -e .[dev,parquet]
copy .env.example .env
mmmeta config show
mmmeta migrate
mmmeta ingest examples\sample_data
mmmeta process run
mmmeta api serve --host 127.0.0.1 --port 8080Open:
- API docs: http://127.0.0.1:8080/docs
- Metrics: http://127.0.0.1:8080/metrics
src/mmmeta/
api/ FastAPI application and routers
cli/ Typer CLI
core/ Configuration, logging, metrics, utilities
db/ Engine, sessions, initialization helpers
exporters/ JSON, NDJSON, CSV, Parquet, Markdown exporters
extractors/ Metadata extraction and parsing modules
legacy/ Adapters for existing subtitle workflows
models/ SQLAlchemy ORM models
pipelines/ DAG pipeline framework and built-ins
plugins/ SDK and plugin discovery
providers/ OpenAI-compatible provider abstraction
schemas/ Pydantic v2 models and JSON schema exports
security/ API key, JWT, rate limiting
storage/ Artifact storage backends
vectorstores/ Embedding storage and similarity search
workers/ Persistent queue execution
ingestscans directories recursively, hashes supported files, records assets, and captures sidecar metadata.process runexecutes the built-in pipeline graph against queued assets.searchperforms full-text, semantic, or hybrid search over normalized metadata.exportwrites normalized datasets as JSON, NDJSON, CSV, Parquet, Markdown, or SQLite snapshots.watchmonitors directories and auto-enqueues new or changed assets.
The original convert_srts_to_metadata.py flow is preserved as a legacy adapter. The new system can ingest the same SRT and .info.json companion layout while routing derivative generation through the configurable provider layer.
ruff check .
mypy src
pytest