Skip to content

Audio Synchronisation

Marshalleq edited this page May 24, 2026 · 4 revisions

Audio Synchronisation

Audio synchronisation is the core innovation of the DDD Capture Toolkit. This page explains how the sync system works and how to get the best results.

The Problem

The Domesday Duplicator captures RF video from VHS tapes, but audio must be captured separately. This creates three major sync issues:

1. Different Start Times

RF capture and audio capture are started by different processes, possibly on different hardware. Even milliseconds of difference creates noticeable sync issues.

2. Clock Drift

RF and audio use independent clocks. Over a 2-hour tape, even tiny clock differences accumulate into seconds of drift.

3. VCR Speed Variation

Your VCR doesn't play at exactly 25fps (PAL) or 29.97fps (NTSC). It might be 25.0001fps, causing audio to slowly drift throughout playback.

Background: what people tried before this toolkit

Before the synchronised-capture system documented below existed, DdD users worked around the missing audio in three ways, all of which had real problems:

Capture the tape twice

Run the DdD once for video RF, then again while recording audio separately.

  • No two tape passes are identical — different tracking, different dropouts, different speed variations.
  • You still have to manually align the audio.
  • No reliable .tbc.json timing data that corresponds to the audio pass.
  • Doubles capture time and tape wear.

Audio to a separate device

Capture audio to a different recorder (e.g. a Mac via the player's audio outputs) while the DdD captures RF simultaneously.

  • Still no clock synchronisation between devices, so drift accumulates over the tape.
  • Still need to manually figure out the start offset.
  • Different sample rates and timing references; manual extraction and alignment workflow.

Two Domesday Duplicators

Use one DdD for video RF, another configured for audio RF capture.

  • Expensive — two DdDs.
  • The original DomesdayDuplicator software was GUI-only with no way to start both captures programmatically at exactly the same moment.
  • You still need to synchronise the clocks between the two units.
  • Audio still needs alignment and drift compensation.

The underlying blocker all three of these ran into was that the DomesdayDuplicator software had no command-line interface. Without programmatic capture control, true automation wasn't possible, and "two captures at the same moment" was always going to have an uncontrolled offset. The first step of this toolkit's solution was to add command-line flags to the DomesdayDuplicator software itself — see Foundation: command-line control below.

The Solution

The toolkit uses a three-part solution:

flowchart LR
    A["Clockgen Lite<br/>(synchronises capture clocks)"] --> B["Offset Calculation<br/>(measures audio_delay)"]
    B --> C["VhsDecodeAutoAudioAlign<br/>(corrects VCR speed wobble)"]
    C --> D["Perfectly Synced Audio<br/>(in _final.mkv)"]
Loading

Foundation: command-line control for DomesdayDuplicator

Before any of the three sync parts can do their job, both captures need to start on the same trigger. The original DomesdayDuplicator software was GUI-only — there was no way to start a capture programmatically, which made true synchronisation impossible. As part of this project, command-line flags were added to the DomesdayDuplicator software:

DomesdayDuplicator --start-capture --headless   # start capture, no GUI
DomesdayDuplicator --stop-capture               # stop capture programmatically

The capture script (ddd_clockgen_sync.py) uses these flags to start the RF capture and the audio capture together with precise timing. This is the foundation that makes everything else on this page possible.

Part 1: Clockgen Lite

Clockgen Lite is a hardware modification that clocks both the Domesday Duplicator and your audio ADC from the same master clock source.

Benefits:

  • Sample-level synchronisation
  • No clock drift between RF and audio
  • Wow and flutter captured identically in both streams

Part 2: Initial Offset Calculation

Even with synchronised clocks, the two capture processes start at slightly different times. The toolkit measures this offset using the VHS Timecode Calibration system:

  1. Generate a calibration video with embedded timecodes (visual + audio FSK)
  2. Record to VHS and capture back
  3. Analyse to compare where the same timecode appears in video vs audio
  4. Save the measured offset to configuration

The offset is typically 500-700ms and remains consistent for your hardware setup. Once calibrated, the offset is automatically applied during capture.

Part 3: VhsDecodeAutoAudioAlign

VhsDecodeAutoAudioAlign by Rene Wolf handles VCR speed compensation:

How it works:

  1. Reads actual field timing from .tbc.json (produced by vhs-decode)
  2. Calculates exactly where each video field occurs in time
  3. Time-stretches/compresses audio to match actual video field boundaries
  4. Handles dropped fields (tape damage) by skipping corresponding audio

The maths:

Output sample 10000 → maps to input sample 9998.7 (via timing projection)

Nearest-neighbour interpolation is used. Speed differences are tiny, so sample skipping/duplication is rare and inaudible.

Using Audio Alignment

Prerequisites

For audio alignment to work, you need:

  1. Audio file - .flac captured alongside RF
  2. TBC JSON - .tbc.json produced by vhs-decode decode step

Starting Alignment

In the Workflow Control Centre, after decode is complete:

1A    # Start audio alignment for project 1

What Happens

  1. The alignment tool reads the .tbc.json for timing data
  2. It reads your captured .flac audio
  3. It stretches/compresses audio to match video timing
  4. Outputs _aligned.flac file

Why FLAC, not WAV? The aligned output was originally .wav, which is fine for short tapes but caps at ~4 GB. At 24-bit / 78125 Hz (the Clockgen-Lite native rate) that's reached on tapes longer than ~2.5 hours, and the alignment would silently truncate. FLAC has no such limit, is read natively by final-mux and by Resolve (and similar NLEs) at the resampled output rate, and is lossless so nothing is given up. The toolkit still detects existing _aligned.wav files from older runs for backward compatibility, but every new alignment produces _aligned.flac.

Final Muxing

After both export and alignment are complete:

1F    # Create final muxed output

This combines:

  • ProjectName_ffv1.mkv - Video from export
  • ProjectName_aligned.flac - Synchronised audio

Into:

  • ProjectName_final.mkv - Final output with synced audio

Alignment Without Clockgen Lite

If you don't have the Clockgen Lite modification, audio sync is still possible but requires more manual work:

  1. Capture audio via a separate recorder
  2. Manual offset - Determine start offset by comparing audio/video
  3. VhsDecodeAutoAudioAlign - Still handles speed compensation
  4. Manual verification - May need adjustment

Results won't be as perfect as with Clockgen Lite, but are still much better than no alignment.

Troubleshooting Audio Sync

Audio is offset but consistent

The initial offset calibration may be wrong. Run the VHS Timecode Calibration process to measure the correct offset for your hardware setup. The offset is stored in audio_delay in your configuration.

Audio drifts over time

This indicates clock drift between RF and audio capture. Clockgen Lite eliminates this. Without it, VhsDecodeAutoAudioAlign does its best but results may vary.

Sync jumps at certain points

This usually indicates dropped fields from tape damage. The alignment tool handles this automatically, but severe damage may cause audible artifacts.

No timing data

If alignment fails with "no timing data", ensure vhs-decode completed successfully and produced a valid .tbc.json file.

Technical Details

File Dependencies

CAPTURE
├── ProjectName.lds      # RF capture
├── ProjectName.flac     # Audio capture
└── ProjectName.json     # Capture metadata

DECODE
├── ProjectName.tbc      # Time base corrected video
└── ProjectName.tbc.json # TBC metadata with field timing ← Required for alignment

ALIGN
└── ProjectName_aligned.flac  # Synchronised audio output

The .tbc.json File

This file contains per-field timing information that makes precision alignment possible:

{
  "videoParameters": {
    "system": "PAL",
    "fieldWidth": 1135,
    "fieldHeight": 313
  },
  "fields": [
    {"diskLoc": 0, "fieldNo": 0, "fileLoc": 0, "isFirstField": true},
    {"diskLoc": 40000, "fieldNo": 1, "fileLoc": 40000, "isFirstField": false},
    ...
  ]
}

The fields array maps each video field to its exact position in the RF capture, enabling sample-accurate audio alignment.

Audio Sample Rate and Format in Output

The audio is captured at 78125 Hz — the native rate of the clockgen-Lite ADC. This is non-standard and most NLEs (DaVinci Resolve, Premiere, etc.) either reject it outright or resample it themselves at lower quality. The toolkit handles this by resampling once at the final-mux step, using aresample=resampler=soxr:precision=33:osf=s32 for archival-quality conversion on the non-integer ratio.

Where in the Pipeline

flowchart LR
    flac[(".flac<br/>78125 Hz<br/>master")] --> align
    tbcjson[(".tbc.json<br/>field timing")] --> align
    align["align<br/>operates at 78125 Hz<br/>(unchanged)"] --> aligned[("_aligned.flac<br/>78125 Hz")]
    aligned --> mux
    ffv1[("_ffv1.mkv<br/>video only")] --> mux
    mux["final-mux<br/>aresample=soxr<br/>(78125 → 96000)"] --> final[("_final.mkv<br/>96 kHz FLAC<br/>(configurable)")]
Loading

Resampling happens only at final-mux. Every earlier stage uses 78125 Hz natively. The .flac master is never modified.

Why Not Resample Earlier

  • The align tool maps audio samples to RF sample positions in .tbc.json (which are in 40 MHz units). Its math is built around the 78125 Hz source rate.
  • Resampling before align would either break the math (wrong samples-per-field count) or require two resamples in series (extra quality loss).
  • Keeping align at native rate and resampling once at final-mux preserves the cleanest possible sync calculation and the highest possible audio fidelity.

Configuration

Two defaults live in config.json under performance_settings:

Setting Default Options
default_audio_resample_rate "96000" "none", "48000", "96000", "192000"
default_audio_format "flac" "flac", "wav"

Default rationale:

  • 96000 Hz is the closest common standard above 78125 (next-most-common would be 48000 Hz, which is below the source rate). Resampling up preserves all source bandwidth.
  • FLAC is lossless, compressed, and has no file-size limit. The classic WAV format is capped at 4 GB — too small for an 8-hour LP capture at 24/96 stereo. DaVinci Resolve 16+ (released 2019) supports FLAC natively.

Set "none" if you actively want the _final.mkv to keep the native 78125 Hz audio.

Per-Project Overrides

Defaults can be overridden per-project via the audio flags in Project Flags:

Flag Value Effect
resample_target "none" / "48000" / "96000" / "192000" Per-project resample rate
audio_format "flac" / "wav" Per-project codec

Legacy boolean flags resample_48k (= "48000") and output_wav (= "wav") are still respected for backward compatibility.

Upgrading an Existing Capture to a Standard Rate

If you have an old capture whose _final.mkv still has 78125 Hz audio and you want to upgrade it without redoing the slow decode/export:

  1. Confirm default_audio_resample_rate in config.json is set how you want (e.g. "96000")
  2. Delete the existing _aligned.flac and _final.mkv
  3. Re-run Align for the project (1A in the Workflow Control Centre)
  4. Re-run Final mux for the project (1F)

The original .flac, .lds, .tbc, and _ffv1.mkv are reused — only the cheap end of the pipeline runs. The new _final.mkv will have audio at the configured rate.

Best Practices

  1. Use Clockgen Lite - The best results require synchronised capture clocks
  2. Capture together - Always capture RF and audio from the same playback
  3. Verify TBC JSON - Ensure decode completed successfully before alignment
  4. Check the result - Watch a segment of the final output to verify sync
  5. Keep originals - Keep your original .flac in case you need to re-align

DDD Capture Toolkit

Home

Getting Started

Features

Internals

Reference


Quick Reference

Workflow Commands:

  • 1D - Decode project 1
  • 1M - Compress project 1
  • 1E - Export project 1
  • 1A - Align audio
  • 1F - Final mux
  • 1X - Project settings
  • 1mv - Validate compressed master (Tier 3)
  • hash 1 - Hash files lacking a recorded hash
  • check 1 - Re-hash and compare to log

Key Features:

  • PAL/NTSC auto-detect
  • Reverse field order (automatic)
  • Segment testing mode
  • Three-tier compress validation
  • Automatic checksums + per-project validation log

Clone this wiki locally