Skip to content

Add assign-speakers pipeline node #46

@natashaannn

Description

@natashaannn

Add assign-speakers pipeline node

Objective

Assign speaker labels to diarized + aligned transcript segments.


Why this matters

This is the first stage where the pipeline begins producing human-usable structured transcript data, not just ML outputs.


Parallelization metadata

Track

pipeline

Depends on

Safe to run in parallel with

  • merge-doc node
  • pipeline validation

Merge risk

  • medium

Files in scope

Primary:

  • scripts/pipeline/nodes/assign-speakers.ts

Avoid touching:

  • diarization model logic

Required implementation

1. Implement node wrapper

Expose as PipelineNode.

2. Input contract

Must consume:

  • diarized segments
  • aligned transcript

3. Output contract

Produces:

  • speaker-assigned transcript JSON

Constraints

  • assignment logic reuse only
  • no ML changes

Acceptance criteria

Functional

  • speaker labels assigned correctly

Integration checks

  • output artifact exists in DAG

Verification commands

Run pipeline through assign-speakers.

Expected result:

  • transcript contains speaker labels

Explicitly out of scope

  • merge-doc
  • final transcript formatting

Suggested branch

refactor/s7-assign-speakers

Suggested commit slug

phase-2-step-6-assign-speakers

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions