Skip to content

Passing language has inconsistent results #3

@lazzarello

Description

@lazzarello

Passing the language string to the engine doesn't consistently translate. For example, this log from an input recording of English and a language parameter of Chinese outputs English text.

> {"event": "on", "type": "transcribe", "language": "zh"}
Recording started
Warning: Some sources (like microphones) may produce inaudible results
         with 8-bit sampling. Use '-f' argument to increase resolution
         e.g. '-f S16_LE'.
Recording WAVE 'test.wav' : Unsigned 8 bit, Rate 8000 Hz, Mono
{"event": "off", "type": "transcribe"}
Aborted by signal Terminated...
Recording stopped
Device set to use cpu
Output language:  zh
/home/lee/.pyenv/versions/3.10.15/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:573: FutureWarning: The input name `inputs` is deprecated. Please make sure to use `input_features` instead.
  warnings.warn(
You have passed language=zh, but also have set `forced_decoder_ids` to [[1, None], [2, 50360]] which creates a conflict. `forced_decoder_ids` will be ignored in favor of language=zh.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.

Seemingly random following attempts translate correctly for the English "I like Chinese food" to "我喜欢中国菜" which is correct. Look into the Attention mask and EOS and PAD tokens. What is "my input's attention_mask"?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions