Skip to content

Siddhant-K-code/lipsync-check

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lipsync-check

Detects audio-visual desync in video files using Gemma 4 via Ollama.

lipsync-check --model gemma4:e4b video.mp4
lipsync-check --model gemma4:e4b video.mp4 --quick --start 60 --duration 30
lipsync-check --model gemma4:e4b video.mp4 --json | jq '.verdict'

Set OLLAMA_MODEL=gemma4:e4b to skip --model on every command.

No cloud. Single binary.

Install

curl -fsSL https://raw.githubusercontent.com/Siddhant-K-code/lipsync-check/main/install.sh | bash

Requirements

  • Ollama running locally with gemma4:e4b pulled
  • ffmpeg on your PATH
# Ollama model
ollama pull gemma4:e4b

# ffmpeg
brew install ffmpeg          # macOS
sudo apt install ffmpeg      # Ubuntu/Debian
sudo dnf install ffmpeg      # Fedora

Install

curl -fsSL https://raw.githubusercontent.com/Siddhant-K-code/lipsync-check/main/install.sh | bash

Installs lipsync-check to /usr/local/bin. Override with INSTALL_DIR=~/.local/bin.

Other install options

Via Go:

go install github.com/Siddhant-K-code/lipsync-check/cmd/tsi@latest

Build from source:

git clone https://github.com/Siddhant-K-code/lipsync-check
cd temporal-sync-inspector
go build -o lipsync-check ./cmd/tsi/

Manual download: Grab the binary for your platform from the Releases page, extract, and move to your $PATH.

Usage

lipsync-check --model gemma4:e4b video.mp4                                    # full analysis
lipsync-check --model gemma4:e4b video.mp4 --quick --start 0                  # single window from t=0s
lipsync-check --model gemma4:e4b video.mp4 --quick --start 60 --duration 45
lipsync-check --model gemma4:e4b video.mp4 --fps 2                            # higher frame rate
lipsync-check --model gemma4:e4b video.mp4 --json                             # raw JSON output
lipsync-check --model gemma4:e4b video.mp4 --host http://192.168.1.10:11434  # remote Ollama
All flags
Flag Default Description
--model gemma4:e4b Ollama model
--host http://localhost:11434 Ollama host
--fps 1 Frames per second to extract
--window 30 Window size in seconds
--quick false Analyze a single window only
--start 0 Start time in seconds (quick mode)
--duration 30 Duration in seconds (quick mode)
--json false Output raw JSON

Example output

  Temporal Sync Inspector
  Model: gemma4:e4b  |  Window: 30s  |  FPS: 1

  ✗ SIGNIFICANT DESYNC
  Sync score:  33.3%
  Windows:     3 analyzed, 2 with desync
  Desync at:   12.4s, 67.1s

  Per-window breakdown
  ────────────────────────────────────────────────────────────
  ✓ t=0s–30s   [high confidence]
    Audio and video appear synchronized.

  ✗ t=30s–60s  [high confidence]  [lip_sync]  (~180ms offset)
    Mouth movements lag behind audio by ~180ms.

  ✗ t=60s–90s  [medium confidence]  [action_sync]
    Clapping sounds precede visible hand contact by ~2 frames.
JSON schema
{
  "total_windows": 3,
  "desync_windows": 2,
  "sync_score": 33.3,
  "verdict": "SIGNIFICANT DESYNC",
  "desync_timestamps": [12.4, 67.1],
  "windows": [
    {
      "window_start_s": 0,
      "window_end_s": 30,
      "in_sync": true,
      "confidence": "high",
      "desync_detected_at_s": null,
      "desync_type": null,
      "estimated_offset_ms": null,
      "reasoning": "Audio and video appear synchronized.",
      "suspicious_frames": []
    }
  ]
}

Project structure

lipsync-check/
├── cmd/tsi/main.go
└── internal/
    ├── extract/extract.go      # ffmpeg wrapper — frames + 16kHz WAV
    ├── ollama/client.go        # Ollama /api/chat multimodal client
    └── inspector/inspector.go  # windowed analysis + summary

License

MIT

About

Detects audio-visual desync in video files using Gemma 4 via Ollama

Topics

Resources

License

Stars

Watchers

Forks

Contributors