fix: support PyannoteAudioPretrainedSpeakerEmbedding in speaker mapping by wkochFPV · Pull Request #644 · speaches-ai/speaches

wkochFPV · 2026-05-17T08:03:22Z

Known_speaker_names did not work for pyannote/speaker-diarization-community-1 using the new pyannote diarization.

Disclaimer: This is a fix that was entirely created using claude.ai Sonnet. It works perfectly for me, that is why I am sharing it. I did not personally review the changes, but performed a lot of successful tests with it.

The pyannote/speaker-diarization-community-1 model uses PyannoteAudioPretrainedSpeakerEmbedding as its embedding backend, which has no .eval() method and expects a 3D tensor [batch, channels, samples] instead of an audio dict.

This caused _map_to_known_speakers to always fail with:
AttributeError: 'PyannoteAudioPretrainedSpeakerEmbedding' has no attribute 'eval'
ValueError: shapes (1,256) and (1,256) not aligned (missing .flatten())

Fix:

Add _to_3d() helper to normalize tensor dimensions
Add _embed() with fallback: try Inference() first, then direct call with 3D tensor
Add _embed_crop() with same fallback for per-turn crops
Always .flatten() the result to ensure 1D vector for cosine similarity

The pyannote/speaker-diarization-community-1 model uses PyannoteAudioPretrainedSpeakerEmbedding as its embedding backend, which has no .eval() method and expects a 3D tensor [batch, channels, samples] instead of an audio dict. This caused _map_to_known_speakers to always fail with: AttributeError: 'PyannoteAudioPretrainedSpeakerEmbedding' has no attribute 'eval' ValueError: shapes (1,256) and (1,256) not aligned (missing .flatten()) Fix: - Add _to_3d() helper to normalize tensor dimensions - Add _embed() with fallback: try Inference() first, then direct call with 3D tensor - Add _embed_crop() with same fallback for per-turn crops - Always .flatten() the result to ensure 1D vector for cosine similarity

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: support PyannoteAudioPretrainedSpeakerEmbedding in speaker mapping#644

fix: support PyannoteAudioPretrainedSpeakerEmbedding in speaker mapping#644
wkochFPV wants to merge 1 commit into
speaches-ai:masterfrom
wkochFPV:fix/speaker-diarization-community-embedding

wkochFPV commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

wkochFPV commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant