Hello, thank you very much for your work on this project. I'm using MeCab for a language-learning program, and would like to use this library if possible.
The mecab binary allowed for some arguments to be passed which would affect its output. For example:
$ mecab -F %m\\t%t\\t%h\\n -U %m\\t%t\\t%h\\n -E EOP\\t3\\t7\\n
太郎はこの本を女性に渡した。
太郎 2 44
は 6 16
この 6 68
本 2 38
を 6 13
女性 2 38
に 6 13
渡し 2 31
た 6 25
。 3 7
EOP 3 7
Is there a way to get the same with this python library? I tried some obvious attempts, e.g.
import MeCab
t = MeCab.Tagger('-F %m\\t%t\\t%h\\n -U %m\\t%t\\t%h\\n -E EOP\\t3\\t7\\n -r ./mecabrc_dummy.txt -d ./.venv/lib/python3.11/site-packages/unidic_lite/dicdir') # also tried single \ instead of \\
sentence = "太郎はこの本を女性に渡した。"
print(t.parse(sentence))
$ python main.py
太郎 タロー タロウ タロウ 名詞-固有名詞-人名-名 1
は ワ ハ は 助詞-係助詞
この コノ コノ 此の 連体詞 0
...
渡し ワタシ ワタス 渡す 動詞-一般 五段-サ行 連用形-一般 0
た タ タ た 助動詞 助動詞-タ 終止形-一般
。 。 補助記号-句点
EOS
output-format-type = custom
; output custom - new three-column output
node-format-custom = %m\t%t\t%h\n
unk-format-custom = %m\t%t\t%h\n
bos-format-custom =
eos-format-custom = EOP\t3\t7\n
With that, the output was more or less what I expected (the third column is different, but that doesn't matter):
$ python main.py
太郎 2 1
は 6 1
この 6 1
本 2 1
...
た 6 1
。 3 1
EOP 3 7
t = MeCab.Tagger('-r ./mecabrc_dummy.txt -d ./.venv/lib/python3.11/site-packages/unidic/dicdir -F %m\\t%t\\t%h\\n -U %m\\t%t\\t%h\\n -E EOP\\t3\\t7\\n')
太郎 名詞,固有名詞,人名,名,,,タロウ,タロウ,太郎,タロー,太郎,タロー,固,"","","","","","",名,タロウ,タロウ,タロウ,タロウ,"1","","",6252931250790912,22748
は 助詞,係助詞,,,,,ハ,は,は,ワ,は,ワ,和,"","","","","","",係助,ハ,ハ,ハ,ハ,"","動詞%F2@0,名詞%F1,形容詞%F2@-1","",8059703733133824,29321
この 連体詞,,,,,,コノ,此の,この,コノ,この,コノ,和,"","","","","","",相,コノ,コノ,コノ,コノ,"0","","",3547308012741120,12905
...
。 補助記号,句点,,,,,,。,。,,。,,記号,"","","","","","",補助,,,,,"","","",6880571302400,25
EOS
Hello, thank you very much for your work on this project. I'm using MeCab for a language-learning program, and would like to use this library if possible.
The mecab binary allowed for some arguments to be passed which would affect its output. For example:
Is there a way to get the same with this python library? I tried some obvious attempts, e.g.
but this still outputs the same as the default Tagger output:
I edited
unidic_lite/dicdir/dicrc:With that, the output was more or less what I expected (the third column is different, but that doesn't matter):
I did try with unidic, instead of unidic_lite,
and got the default unidic output:
Thank you again!