Skip to content

_strip_incomplete_words cuts complete words #32

@azziko

Description

@azziko

Hello,

I saw this and wondered, wouldn't this cut complete words too? Assume, nothing is in the alignatt threshold, the hypothesis is:

['▁U', 'ser', '▁Inter', 'ac', 'tion', '.']

Then the whole Interaction is cut, even though it was not in the frame_threshold

selected_tokens = self._strip_incomplete_words(selected_tokens)

If this is not intended, I would put it like this:

        # Truncate tokens up to the first invalid alignment (if any)
        if len(invalid_tok_ids) > 0:
            selected_tokens = selected_tokens[:invalid_tok_ids[0]]
            if self.word_level_postprocess:
                selected_tokens = self._strip_incomplete_words(selected_tokens)

But maybe that's intended for the models that output partials words

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions