podcast-annotations

Every podcast episode is full of references — cars, people, places, concepts — that exist only as sound. Transcripts capture the words, but not the structure. There's no way for apps or tools to know what's being talked about at any given moment.

podcast-annotations is the reference implementation of the Podcast Annotation Format — an open spec for timestamped, typed entity annotations on audio. Think X-Ray for podcasts, but open and format-level instead of locked inside one platform.

Hit play and see:

"LS Engine" appears at 0:45 as the host discusses it
"Turbocharger" pops up at 2:00 with a link to learn more
A timeline shows every topic in the episode — click to jump

All synced to playback. Framework-agnostic vanilla JavaScript — zero dependencies. Built for Car Curious.

Features

Annotation Overlay — Auto-trigger contextual content at specific moments during audio playback
Transcript Sync — Highlight the active transcript segment with auto-scroll and user-interrupt detection
Annotation Timeline — Visual markers showing where annotations appear, with a playhead and click-to-seek
DAI Alignment — Remap canonical transcripts to variant audio with dynamic ad insertion, with gap-aware sync that pauses during ad breaks

Each module works independently. Use one, two, or all three.

Handling timing, overlaps, ad insertion gaps, and transcript sync correctly is surprisingly tricky. This library provides a minimal, battle-tested implementation so you don't have to rebuild it.

Install

npm install podcast-annotations

Or use a CDN:

<script type="module">
  import { AnnotationOverlay, TranscriptSync, AnnotationTimeline } from 'https://esm.sh/podcast-annotations'
</script>

Quick Start

import { AnnotationOverlay, TranscriptSync, AnnotationTimeline } from 'podcast-annotations'

const audio = document.querySelector('audio')

// Annotations auto-trigger during playback
const overlay = new AnnotationOverlay(audio, {
  annotations: [
    { startTime: 45, endTime: 75, type: 'car', title: 'LS Engine' },
    { startTime: 120, endTime: 150, type: 'part', title: 'Turbocharger' }
  ],
  onAnnotationChange(annotation) {
    document.querySelector('#overlay').innerHTML = annotation
      ? `<strong>${annotation.title}</strong>`
      : ''
  },
  onUpcomingChange(upcoming) {
    document.querySelector('#coming-up').innerHTML = upcoming
      .map(a => `<span>${a.title} at ${Math.floor(a.startTime)}s</span>`)
      .join('')
  }
})

// Transcript from a VTT file — renders and syncs automatically
const transcript = await TranscriptSync.fromURL(audio, '/episode.vtt', {
  container: document.querySelector('#transcript'),
  activeClass: 'highlight',
  onAutoScrollPause() {
    document.querySelector('#resume-btn').hidden = false
  }
})

// Or from existing DOM elements (data-start-time attributes)
// const transcript = new TranscriptSync(audio, {
//   container: document.querySelector('#transcript'),
//   segmentSelector: '[data-start-time]'
// })

// Timeline with markers (style via CSS using data-type attributes)
const timeline = new AnnotationTimeline(audio, {
  container: document.querySelector('#timeline'),
  annotations: overlay.annotations,
  onSeek(time) { audio.currentTime = time }
})

API

`AnnotationOverlay(audio, options)`

Option	Type	Default	Description
`annotations`	`Array`	`[]`	`[{ startTime, endTime, data }]`
`overlayElement`	`HTMLElement`	—	Element to add/remove `activeClass`/`hiddenClass`
`activeClass`	`string`	`'active'`	Class when annotation is visible
`hiddenClass`	`string`	`'hidden'`	Class when no annotation
`onAnnotationChange`	`Function`	—	`(annotation \| null) => void`
`onUpcomingChange`	`Function`	—	`(upcoming[]) => void`
`upcomingLimit`	`number`	`3`	Max upcoming annotations
`leadTime`	`number`	`2`	Seconds before startTime to trigger
`transitionBuffer`	`number`	`5`	Gap between consecutive annotations
`maxExtension`	`number`	`60`	Max seconds to extend past endTime

Methods: setAnnotations(annotations), destroy() Getters: currentAnnotation, upcoming

Static factories:

AnnotationOverlay.fromURL(audio, url, options) — Fetch a .annotations.json file and create a synced overlay. Returns { overlay, annotationSet } where annotationSet contains episode metadata, speakers, transcripts, and ad breaks.

const { overlay, annotationSet } = await AnnotationOverlay.fromURL(audio, '/episode.annotations.json', {
  onAnnotationChange(annotation) { ... }
})
// annotationSet.speakers, annotationSet.adBreaks, etc. available for additional UI

Standalone fetch:

fetchAnnotationSet(url) — Fetch and parse a .annotations.json file without creating an overlay.

`TranscriptSync(audio, options)`

Option	Type	Default	Description
`container`	`HTMLElement`	—	Scrollable transcript container
`segmentSelector`	`string`	`'[data-start-time]'`	CSS selector for segments
`startTimeAttribute`	`string`	`'data-start-time'`	Attribute with time in seconds
`activeClass`	`string`	`'active'`	Current segment class
`pastClass`	`string`	`'past'`	Past segments class
`futureClass`	`string`	`'future'`	Future segments class
`autoScroll`	`boolean`	`true`	Auto-scroll to active segment
`scrollBehavior`	`string`	`'smooth'`	Scroll behavior
`onSegmentChange`	`Function`	—	`(element, index) => void`
`onAutoScrollPause`	`Function`	—	Called when user scrolls away
`onAutoScrollResume`	`Function`	—	Called when auto-scroll resumes

Methods: resumeAutoScroll(), refresh(), destroy() Getters: isAutoScrolling

Static factories:

TranscriptSync.fromVTT(audio, vttString, options) — Parse a VTT/SRT string, render segments into container, return a synced instance.
TranscriptSync.fromURL(audio, url, options) — Fetch a VTT/SRT file, render, and return a synced instance (async).
options.renderSegment(cue, element) — Custom renderer for VTT cues. Default renders speaker name + text.

`AnnotationTimeline(audio, options)`

Option	Type	Default	Description
`container`	`HTMLElement`	—	Timeline container element
`annotations`	`Array`	`[]`	Annotation data
`duration`	`number`	—	Total seconds (auto-detected if omitted)
`onSeek`	`Function`	—	`(timeInSeconds) => void`
`markerClass`	`string`	`'pa-timeline-marker'`	CSS class for markers
`playheadClass`	`string`	`'pa-timeline-playhead'`	CSS class for playhead
`renderMarker`	`Function`	—	`(annotation, element) => void`

Markers get data-type attributes for CSS-based styling. Style with:

.pa-timeline-marker[data-type="car"] { background: #60a5fa; }
.pa-timeline-marker[data-type="term"] { background: #c084fc; }

Methods: setAnnotations(annotations), destroy()

`ChapterSync(audio, chapters, options)`

Syncs Podcasting 2.0 JSON chapters with audio playback.

// From a URL
const chapters = await ChapterSync.fromURL(audio, '/chapters.json', {
  container: document.querySelector('#chapters'),
  onChapterChange(chapter) { updateNowPlaying(chapter?.title) }
})

// From a JSON object
const chapters = ChapterSync.fromJSON(audio, chaptersData, options)

// From an array directly
const chapters = new ChapterSync(audio, chaptersArray, options)

Option	Type	Default	Description
`container`	`HTMLElement`	—	Container for rendered chapter list
`activeClass`	`string`	`'active'`	Class for current chapter
`autoplay`	`boolean`	`false`	Start playback when a chapter is clicked
`chapterClass`	`string`	`'pa-chapter'`	CSS class for chapter elements
`onChapterChange`	`Function`	—	`(chapter \| null, index) => void`
`onSeek`	`Function`	—	`(timeInSeconds) => void`
`renderChapter`	`Function`	—	`(chapter, element) => void`

Methods: setChapters(chapters), destroy() Getters: currentChapter Static: ChapterSync.fromJSON(audio, json, options), ChapterSync.fromURL(audio, url, options)

`AlignedTranscript(canonicalCues, mapping)`

Handles DAI (Dynamic Ad Insertion) alignment — takes a canonical transcript and an alignment mapping, produces a timeline with remapped timestamps and gap regions for inserted ads/promos.

import { AlignedTranscript } from 'podcast-annotations'

const aligned = new AlignedTranscript(canonicalCues, {
  confidence: 0.95,
  ranges: [
    { canonicalStart: 0, canonicalEnd: 120, variantStart: 0, variantEnd: 120 },
    { canonicalStart: 120, canonicalEnd: 300, variantStart: 150, variantEnd: 330 }
  ],
  gaps: [
    { variantStart: 120, variantEnd: 150, label: 'ad' }
  ]
})

// Interleaved content + gap segments, sorted by variant time
aligned.segments.forEach(seg => {
  if (seg.type === 'content') {
    renderCue(seg.cue, seg.variantStart, seg.variantEnd)
  } else {
    renderAdBreak(seg.gap.label, seg.gap.variantStart)
  }
})

// Feed remapped cues into TranscriptSync with gap awareness
const transcript = TranscriptSync.fromVTT(audio, vttString, {
  container: document.querySelector('#transcript'),
  gaps: aligned.gaps,
  onGapEnter(gap) { showAdIndicator(gap.label) },
  onGapExit() { hideAdIndicator() }
})

Getters: segments, remappedCues, gaps, confidence, isSyncReliable Methods: isInGap(variantTime)

TranscriptSync gap options

When using TranscriptSync with DAI content, these additional options pause highlighting during ad breaks:

Option	Type	Default	Description
`gaps`	`AlignmentGap[]`	`[]`	Gap ranges where highlighting pauses
`gapClass`	`string`	`'gap'`	Class applied to gap elements
`onGapEnter`	`Function`	—	`(gap) => void` — called when playback enters a gap
`onGapExit`	`Function`	—	`() => void` — called when playback leaves a gap

Methods: setGaps(gaps) — update gap ranges dynamically Getters: isInGap — whether current playback position is inside a gap

Types

interface AlignmentMapping {
  variantHash?: string       // Hash of the variant audio file
  confidence: number         // Confidence score 0–1
  ranges: AlignmentRange[]   // Matched content ranges
  gaps: AlignmentGap[]       // Unmapped gap ranges (ads, promos, etc)
}

interface AlignmentRange {
  canonicalStart: number     // Start time in the canonical transcript
  canonicalEnd: number       // End time in the canonical transcript
  variantStart: number       // Start time in the variant audio
  variantEnd: number         // End time in the variant audio
}

interface AlignmentGap {
  variantStart: number       // Start time in variant audio
  variantEnd: number         // End time in variant audio
  label?: string             // e.g. "ad", "transition", "unknown"
  position?: string          // "pre-roll", "mid-roll", or "post-roll"
}

Timing utilities

import { enrichAnnotationsWithTiming, selectCurrentAnnotation, upcomingAnnotations } from 'podcast-annotations'

Low-level functions if you want full control over the timing logic.

Prior Art

The DAI alignment model was informed by Marco Arment's discussion of transcript synchronization with dynamic ad insertion in Overcast (ATP #683), which helped shape how we combined our existing offset and signature work into the alignment API.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
scripts		scripts
src		src
test		test
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
SPEC.md		SPEC.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsup.config.ts		tsup.config.ts
vitest.config.js		vitest.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

podcast-annotations

Features

Install

Quick Start

API

`AnnotationOverlay(audio, options)`

`TranscriptSync(audio, options)`

`AnnotationTimeline(audio, options)`

`ChapterSync(audio, chapters, options)`

`AlignedTranscript(canonicalCues, mapping)`

TranscriptSync gap options

Types

Timing utilities

Prior Art

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

podcast-annotations

Features

Install

Quick Start

API

AnnotationOverlay(audio, options)

TranscriptSync(audio, options)

AnnotationTimeline(audio, options)

ChapterSync(audio, chapters, options)

AlignedTranscript(canonicalCues, mapping)

TranscriptSync gap options

Types

Timing utilities

Prior Art

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`AnnotationOverlay(audio, options)`

`TranscriptSync(audio, options)`

`AnnotationTimeline(audio, options)`

`ChapterSync(audio, chapters, options)`

`AlignedTranscript(canonicalCues, mapping)`

Packages