Detects audio-visual desync in video files using Gemma 4 via Ollama.
lipsync-check --model gemma4:e4b video.mp4
lipsync-check --model gemma4:e4b video.mp4 --quick --start 60 --duration 30
lipsync-check --model gemma4:e4b video.mp4 --json | jq '.verdict'Set OLLAMA_MODEL=gemma4:e4b to skip --model on every command.
No cloud. Single binary.
curl -fsSL https://raw.githubusercontent.com/Siddhant-K-code/lipsync-check/main/install.sh | bash- Ollama running locally with
gemma4:e4bpulled ffmpegon your PATH
# Ollama model
ollama pull gemma4:e4b
# ffmpeg
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu/Debian
sudo dnf install ffmpeg # Fedoracurl -fsSL https://raw.githubusercontent.com/Siddhant-K-code/lipsync-check/main/install.sh | bashInstalls lipsync-check to /usr/local/bin. Override with INSTALL_DIR=~/.local/bin.
Other install options
Via Go:
go install github.com/Siddhant-K-code/lipsync-check/cmd/tsi@latestBuild from source:
git clone https://github.com/Siddhant-K-code/lipsync-check
cd temporal-sync-inspector
go build -o lipsync-check ./cmd/tsi/Manual download:
Grab the binary for your platform from the Releases page, extract, and move to your $PATH.
lipsync-check --model gemma4:e4b video.mp4 # full analysis
lipsync-check --model gemma4:e4b video.mp4 --quick --start 0 # single window from t=0s
lipsync-check --model gemma4:e4b video.mp4 --quick --start 60 --duration 45
lipsync-check --model gemma4:e4b video.mp4 --fps 2 # higher frame rate
lipsync-check --model gemma4:e4b video.mp4 --json # raw JSON output
lipsync-check --model gemma4:e4b video.mp4 --host http://192.168.1.10:11434 # remote OllamaAll flags
| Flag | Default | Description |
|---|---|---|
--model |
gemma4:e4b |
Ollama model |
--host |
http://localhost:11434 |
Ollama host |
--fps |
1 |
Frames per second to extract |
--window |
30 |
Window size in seconds |
--quick |
false | Analyze a single window only |
--start |
0 |
Start time in seconds (quick mode) |
--duration |
30 |
Duration in seconds (quick mode) |
--json |
false | Output raw JSON |
Temporal Sync Inspector
Model: gemma4:e4b | Window: 30s | FPS: 1
✗ SIGNIFICANT DESYNC
Sync score: 33.3%
Windows: 3 analyzed, 2 with desync
Desync at: 12.4s, 67.1s
Per-window breakdown
────────────────────────────────────────────────────────────
✓ t=0s–30s [high confidence]
Audio and video appear synchronized.
✗ t=30s–60s [high confidence] [lip_sync] (~180ms offset)
Mouth movements lag behind audio by ~180ms.
✗ t=60s–90s [medium confidence] [action_sync]
Clapping sounds precede visible hand contact by ~2 frames.
JSON schema
{
"total_windows": 3,
"desync_windows": 2,
"sync_score": 33.3,
"verdict": "SIGNIFICANT DESYNC",
"desync_timestamps": [12.4, 67.1],
"windows": [
{
"window_start_s": 0,
"window_end_s": 30,
"in_sync": true,
"confidence": "high",
"desync_detected_at_s": null,
"desync_type": null,
"estimated_offset_ms": null,
"reasoning": "Audio and video appear synchronized.",
"suspicious_frames": []
}
]
}lipsync-check/
├── cmd/tsi/main.go
└── internal/
├── extract/extract.go # ffmpeg wrapper — frames + 16kHz WAV
├── ollama/client.go # Ollama /api/chat multimodal client
└── inspector/inspector.go # windowed analysis + summary
MIT