[tts] long utts: constituicao might not be ready to go

There's a considerable mismatch w.r.t. dataset's characteristics between Constituicao and LJSpeech. Audios of the former are longer (20s-40s) while the latter's do not usually go beyond 10s, and I'm not sure whether this fact plays nice with FastSpeech 2's recipe. AAMOF ESPnet's TTS recipe [ignores audios longer than 20s by default](https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/tts1/tts.sh#L52).

![](https://user-images.githubusercontent.com/2287025/233843765-c10f3dbc-c9ac-45db-99e3-2f1d82814e35.png)

A possible way to go would be re-segment Constituicao to make individual utts shorter. MFA's has been finding SILs in the middle of sentences quite often - in fact the speaker pauses in between titles and end of sentences. A VAD and an FA would be of great help with that.

[plot_scripts.zip](https://github.com/falabrasil/espnet-br/files/11304564/plot_scripts.zip)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tts] long utts: constituicao might not be ready to go #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[tts] long utts: constituicao might not be ready to go #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions