Skip to content

Add diarize pipeline node #40

@natashaannn

Description

@natashaannn

Add diarize pipeline node

Objective

Migrate speaker diarization into the DAG as a first-class pipeline node.


Why this matters

After sync and transcribe, diarization is the next critical stage for realistic transcript pipeline execution.


Parallelization metadata

Track

pipeline

Depends on

Safe to run in parallel with

  • type contract issues
  • remotion utility issues

Merge risk

  • medium

Files in scope

Primary:

  • scripts/pipeline/nodes/diarize.ts

Avoid touching:

  • Python diarization internals

Required implementation

1. Implement node wrapper

Expose diarization as a PipelineNode.

2. Define inputs/outputs

Use content-addressed artifact references.

3. Delegate to current implementation

No ML logic rewrite.


Constraints

  • wrapper only
  • no pyannote changes

Handoff contract

Runner should support:

sync -> transcribe -> diarize


Acceptance criteria

Functional

  • diarize executes through the runner

Integration checks

  • expected diarization artifact is produced

Verification commands

Run a three-stage pipeline.

Expected result:

  • diarize executes after transcribe

Explicitly out of scope

  • align node
  • assign-speakers node

Suggested branch

refactor/s6-diarize-node

Suggested commit slug

phase-2-step-2-diarize-node

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions