Skip to content

Fix for mangled text in Aux.csv#83

Open
gooosedev wants to merge 1 commit into
taku910:masterfrom
gooosedev:mecab-juman-fixes
Open

Fix for mangled text in Aux.csv#83
gooosedev wants to merge 1 commit into
taku910:masterfrom
gooosedev:mecab-juman-fixes

Conversation

@gooosedev

Copy link
Copy Markdown

This PR should fix the generation of the mangled Aux.csv file (issue #81),

the issues seems to orginate from the conversion script
Inside the convert function when processing a element e with a ctype of 無活用型 leading to the last byte of the POS being truncated:

the solution that I've found was to add the following special case

else{
        # special case for some non inflecting.
        if ($ctype eq "無活用型"){
            my $result ="";
            for (@$surface) {
                $result .= sprintf ("%s,0,0,0,%s,%s,%s,*,%s,%s,%s\n",
                $_, $pos1, $pos2, $ctype, $_, $reading, $meaning);
            }
            return $result;
        }
    }

resulting in theses entries:

です,0,0,0,助動詞,*,無活用型,*,です,です,代表表記:です/です
まい,0,0,0,助動詞,*,無活用型,*,まい,まい,代表表記:まい/まい
ことだ,0,0,0,助動詞,*,無活用型,*,ことだ,ことだ,代表表記:ことだ/ことだ

note: the model.def, left-id.def and right-id.def will probably need to be re-generated and the model re-trained.

@gooosedev gooosedev marked this pull request as ready for review October 14, 2025 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant