Skip to content

slammingprogramming/multimedia-metadata-generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MMMeta

MMMeta is a local-first, production-oriented multimedia metadata generation platform for ingesting large media archives, normalizing discoverable metadata into structured JSON, enriching assets with derivative AI metadata, indexing artifacts for search, and exporting interoperable archives.

Highlights

  • FastAPI API server with OpenAPI docs, health endpoints, metrics, API key auth, optional JWT auth, and WebSocket job progress streaming
  • Typer CLI with ingest, process, watch, search, export, validate, providers, pipelines, jobs, embeddings, subtitles, summarize, transcribe, analyze, config, migrate, stats, dedupe, and benchmark commands
  • Async SQLAlchemy 2.x persistence with SQLite by default and PostgreSQL support through configuration
  • Resumable job queue with persistent state, retries, cancellation, incremental processing, content hashing, and deduplication
  • Extensible plugin SDK for custom extractors, providers, exporters, vector stores, pipelines, and enrichment steps
  • Search stack with SQLite FTS5 plus vector similarity abstraction
  • Local artifact storage with content-addressable layout
  • Built-in subtitle parsing, JSON metadata mapping, image metadata extraction, audio heuristics, and video sidecar inspection
  • Docker, Compose, Alembic, tests, sample plugin, sample data, and React/Vite/Tailwind UI

Quick Start

python -m venv .venv
.venv\Scripts\activate
pip install -e .[dev,parquet]
copy .env.example .env
mmmeta config show
mmmeta migrate
mmmeta ingest examples\sample_data
mmmeta process run
mmmeta api serve --host 127.0.0.1 --port 8080

Open:

Repository Layout

src/mmmeta/
  api/           FastAPI application and routers
  cli/           Typer CLI
  core/          Configuration, logging, metrics, utilities
  db/            Engine, sessions, initialization helpers
  exporters/     JSON, NDJSON, CSV, Parquet, Markdown exporters
  extractors/    Metadata extraction and parsing modules
  legacy/        Adapters for existing subtitle workflows
  models/        SQLAlchemy ORM models
  pipelines/     DAG pipeline framework and built-ins
  plugins/       SDK and plugin discovery
  providers/     OpenAI-compatible provider abstraction
  schemas/       Pydantic v2 models and JSON schema exports
  security/      API key, JWT, rate limiting
  storage/       Artifact storage backends
  vectorstores/  Embedding storage and similarity search
  workers/       Persistent queue execution

Core Workflows

  1. ingest scans directories recursively, hashes supported files, records assets, and captures sidecar metadata.
  2. process run executes the built-in pipeline graph against queued assets.
  3. search performs full-text, semantic, or hybrid search over normalized metadata.
  4. export writes normalized datasets as JSON, NDJSON, CSV, Parquet, Markdown, or SQLite snapshots.
  5. watch monitors directories and auto-enqueues new or changed assets.

Legacy Compatibility

The original convert_srts_to_metadata.py flow is preserved as a legacy adapter. The new system can ingest the same SRT and .info.json companion layout while routing derivative generation through the configurable provider layer.

Running Tests

ruff check .
mypy src
pytest

Documentation

About

Modular multimedia metadata platform for ingesting, normalizing, enriching, searching, and exporting media intelligence. Includes FastAPI API, Typer CLI, async workers, subtitle/SRT pipelines, vector search, plugin SDK, Docker deployment, Alembic migrations, CI, and a React/Vite/Tailwind dashboard.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors