Audio Codes "at utterance level"

## ❓ Questions

I'm interested in using the encoder to encode an audio fragment of a few seconds into just one codebook vector. However, the model returns a sequence of several `audio_codes` (of course, it is the only way to succesfully decode the audio afterwards).

How would you recommend using the encoder, and/or pre-postprocessing the audio input or `audio_codes` to obtain just one audio code "at utterance level"?

Thanks in advance.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audio Codes "at utterance level" #83

❓ Questions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Audio Codes "at utterance level" #83

Description

❓ Questions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions