Skip to content
This repository was archived by the owner on Jun 8, 2020. It is now read-only.
This repository was archived by the owner on Jun 8, 2020. It is now read-only.

Matecat Segmentation Issues #30

@uhallac

Description

@uhallac

We have been experimenting segmentation issues with Matecat, please see the following examples:

  1. When there are dots after numbers inside a sentence. Same in both general and paragraph segmentation most of the time:
  1. When there is a quoted sentence inside the sentence:
    https://www.matecat.com/translate/quotesodt/tr-TR-en-GB/1710109-904edd6b46db#966996041

I'm aware segmentation is not an easy task to accomplish but bad segmentation is causing messed up TMs when translation is done between two syntactically different languages such as Turkish and English. For these 2 segments to be properly reflected in the translated document, we need to improvise with the segment translations as you can see below:
image

I believe the most efficient and easy way to solve this problem is adding the capability to merge multiple segments into one from the UI. At the moment Matecat doesn't let you merge segments unless they were split before. It doesn't seem to be a complex technical task to achieve this. In return it'd have serious benefits. What do you think?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions