Every podcast episode is full of references — cars, people, places, concepts — that exist only as sound. Transcripts capture the words, but not the structure. There's no way for apps or tools to know what's being talked about at any given moment.
podcast-annotations is the reference implementation of the Podcast Annotation Format — an open spec for timestamped, typed entity annotations on audio. Think X-Ray for podcasts, but open and format-level instead of locked inside one platform.
Hit play and see:
- "LS Engine" appears at 0:45 as the host discusses it
- "Turbocharger" pops up at 2:00 with a link to learn more
- A timeline shows every topic in the episode — click to jump
All synced to playback. Framework-agnostic vanilla JavaScript — zero dependencies. Built for Car Curious.
- Annotation Overlay — Auto-trigger contextual content at specific moments during audio playback
- Transcript Sync — Highlight the active transcript segment with auto-scroll and user-interrupt detection
- Annotation Timeline — Visual markers showing where annotations appear, with a playhead and click-to-seek
- DAI Alignment — Remap canonical transcripts to variant audio with dynamic ad insertion, with gap-aware sync that pauses during ad breaks
Each module works independently. Use one, two, or all three.
Handling timing, overlaps, ad insertion gaps, and transcript sync correctly is surprisingly tricky. This library provides a minimal, battle-tested implementation so you don't have to rebuild it.
npm install podcast-annotationsOr use a CDN:
<script type="module">
import { AnnotationOverlay, TranscriptSync, AnnotationTimeline } from 'https://esm.sh/podcast-annotations'
</script>import { AnnotationOverlay, TranscriptSync, AnnotationTimeline } from 'podcast-annotations'
const audio = document.querySelector('audio')
// Annotations auto-trigger during playback
const overlay = new AnnotationOverlay(audio, {
annotations: [
{ startTime: 45, endTime: 75, type: 'car', title: 'LS Engine' },
{ startTime: 120, endTime: 150, type: 'part', title: 'Turbocharger' }
],
onAnnotationChange(annotation) {
document.querySelector('#overlay').innerHTML = annotation
? `<strong>${annotation.title}</strong>`
: ''
},
onUpcomingChange(upcoming) {
document.querySelector('#coming-up').innerHTML = upcoming
.map(a => `<span>${a.title} at ${Math.floor(a.startTime)}s</span>`)
.join('')
}
})
// Transcript from a VTT file — renders and syncs automatically
const transcript = await TranscriptSync.fromURL(audio, '/episode.vtt', {
container: document.querySelector('#transcript'),
activeClass: 'highlight',
onAutoScrollPause() {
document.querySelector('#resume-btn').hidden = false
}
})
// Or from existing DOM elements (data-start-time attributes)
// const transcript = new TranscriptSync(audio, {
// container: document.querySelector('#transcript'),
// segmentSelector: '[data-start-time]'
// })
// Timeline with markers (style via CSS using data-type attributes)
const timeline = new AnnotationTimeline(audio, {
container: document.querySelector('#timeline'),
annotations: overlay.annotations,
onSeek(time) { audio.currentTime = time }
})| Option | Type | Default | Description |
|---|---|---|---|
annotations |
Array |
[] |
[{ startTime, endTime, data }] |
overlayElement |
HTMLElement |
— | Element to add/remove activeClass/hiddenClass |
activeClass |
string |
'active' |
Class when annotation is visible |
hiddenClass |
string |
'hidden' |
Class when no annotation |
onAnnotationChange |
Function |
— | (annotation | null) => void |
onUpcomingChange |
Function |
— | (upcoming[]) => void |
upcomingLimit |
number |
3 |
Max upcoming annotations |
leadTime |
number |
2 |
Seconds before startTime to trigger |
transitionBuffer |
number |
5 |
Gap between consecutive annotations |
maxExtension |
number |
60 |
Max seconds to extend past endTime |
Methods: setAnnotations(annotations), destroy()
Getters: currentAnnotation, upcoming
Static factories:
AnnotationOverlay.fromURL(audio, url, options)— Fetch a.annotations.jsonfile and create a synced overlay. Returns{ overlay, annotationSet }whereannotationSetcontains episode metadata, speakers, transcripts, and ad breaks.
const { overlay, annotationSet } = await AnnotationOverlay.fromURL(audio, '/episode.annotations.json', {
onAnnotationChange(annotation) { ... }
})
// annotationSet.speakers, annotationSet.adBreaks, etc. available for additional UIStandalone fetch:
fetchAnnotationSet(url)— Fetch and parse a.annotations.jsonfile without creating an overlay.
| Option | Type | Default | Description |
|---|---|---|---|
container |
HTMLElement |
— | Scrollable transcript container |
segmentSelector |
string |
'[data-start-time]' |
CSS selector for segments |
startTimeAttribute |
string |
'data-start-time' |
Attribute with time in seconds |
activeClass |
string |
'active' |
Current segment class |
pastClass |
string |
'past' |
Past segments class |
futureClass |
string |
'future' |
Future segments class |
autoScroll |
boolean |
true |
Auto-scroll to active segment |
scrollBehavior |
string |
'smooth' |
Scroll behavior |
onSegmentChange |
Function |
— | (element, index) => void |
onAutoScrollPause |
Function |
— | Called when user scrolls away |
onAutoScrollResume |
Function |
— | Called when auto-scroll resumes |
Methods: resumeAutoScroll(), refresh(), destroy()
Getters: isAutoScrolling
Static factories:
TranscriptSync.fromVTT(audio, vttString, options)— Parse a VTT/SRT string, render segments into container, return a synced instance.TranscriptSync.fromURL(audio, url, options)— Fetch a VTT/SRT file, render, and return a synced instance (async).options.renderSegment(cue, element)— Custom renderer for VTT cues. Default renders speaker name + text.
| Option | Type | Default | Description |
|---|---|---|---|
container |
HTMLElement |
— | Timeline container element |
annotations |
Array |
[] |
Annotation data |
duration |
number |
— | Total seconds (auto-detected if omitted) |
onSeek |
Function |
— | (timeInSeconds) => void |
markerClass |
string |
'pa-timeline-marker' |
CSS class for markers |
playheadClass |
string |
'pa-timeline-playhead' |
CSS class for playhead |
renderMarker |
Function |
— | (annotation, element) => void |
Markers get data-type attributes for CSS-based styling. Style with:
.pa-timeline-marker[data-type="car"] { background: #60a5fa; }
.pa-timeline-marker[data-type="term"] { background: #c084fc; }Methods: setAnnotations(annotations), destroy()
Syncs Podcasting 2.0 JSON chapters with audio playback.
// From a URL
const chapters = await ChapterSync.fromURL(audio, '/chapters.json', {
container: document.querySelector('#chapters'),
onChapterChange(chapter) { updateNowPlaying(chapter?.title) }
})
// From a JSON object
const chapters = ChapterSync.fromJSON(audio, chaptersData, options)
// From an array directly
const chapters = new ChapterSync(audio, chaptersArray, options)| Option | Type | Default | Description |
|---|---|---|---|
container |
HTMLElement |
— | Container for rendered chapter list |
activeClass |
string |
'active' |
Class for current chapter |
autoplay |
boolean |
false |
Start playback when a chapter is clicked |
chapterClass |
string |
'pa-chapter' |
CSS class for chapter elements |
onChapterChange |
Function |
— | (chapter | null, index) => void |
onSeek |
Function |
— | (timeInSeconds) => void |
renderChapter |
Function |
— | (chapter, element) => void |
Methods: setChapters(chapters), destroy()
Getters: currentChapter
Static: ChapterSync.fromJSON(audio, json, options), ChapterSync.fromURL(audio, url, options)
Handles DAI (Dynamic Ad Insertion) alignment — takes a canonical transcript and an alignment mapping, produces a timeline with remapped timestamps and gap regions for inserted ads/promos.
import { AlignedTranscript } from 'podcast-annotations'
const aligned = new AlignedTranscript(canonicalCues, {
confidence: 0.95,
ranges: [
{ canonicalStart: 0, canonicalEnd: 120, variantStart: 0, variantEnd: 120 },
{ canonicalStart: 120, canonicalEnd: 300, variantStart: 150, variantEnd: 330 }
],
gaps: [
{ variantStart: 120, variantEnd: 150, label: 'ad' }
]
})
// Interleaved content + gap segments, sorted by variant time
aligned.segments.forEach(seg => {
if (seg.type === 'content') {
renderCue(seg.cue, seg.variantStart, seg.variantEnd)
} else {
renderAdBreak(seg.gap.label, seg.gap.variantStart)
}
})
// Feed remapped cues into TranscriptSync with gap awareness
const transcript = TranscriptSync.fromVTT(audio, vttString, {
container: document.querySelector('#transcript'),
gaps: aligned.gaps,
onGapEnter(gap) { showAdIndicator(gap.label) },
onGapExit() { hideAdIndicator() }
})Getters: segments, remappedCues, gaps, confidence, isSyncReliable
Methods: isInGap(variantTime)
When using TranscriptSync with DAI content, these additional options pause highlighting during ad breaks:
| Option | Type | Default | Description |
|---|---|---|---|
gaps |
AlignmentGap[] |
[] |
Gap ranges where highlighting pauses |
gapClass |
string |
'gap' |
Class applied to gap elements |
onGapEnter |
Function |
— | (gap) => void — called when playback enters a gap |
onGapExit |
Function |
— | () => void — called when playback leaves a gap |
Methods: setGaps(gaps) — update gap ranges dynamically
Getters: isInGap — whether current playback position is inside a gap
interface AlignmentMapping {
variantHash?: string // Hash of the variant audio file
confidence: number // Confidence score 0–1
ranges: AlignmentRange[] // Matched content ranges
gaps: AlignmentGap[] // Unmapped gap ranges (ads, promos, etc)
}
interface AlignmentRange {
canonicalStart: number // Start time in the canonical transcript
canonicalEnd: number // End time in the canonical transcript
variantStart: number // Start time in the variant audio
variantEnd: number // End time in the variant audio
}
interface AlignmentGap {
variantStart: number // Start time in variant audio
variantEnd: number // End time in variant audio
label?: string // e.g. "ad", "transition", "unknown"
position?: string // "pre-roll", "mid-roll", or "post-roll"
}import { enrichAnnotationsWithTiming, selectCurrentAnnotation, upcomingAnnotations } from 'podcast-annotations'Low-level functions if you want full control over the timing logic.
The DAI alignment model was informed by Marco Arment's discussion of transcript synchronization with dynamic ad insertion in Overcast (ATP #683), which helped shape how we combined our existing offset and signature work into the alignment API.
MIT