Skip to content

What is "scope", "num" columns in the corpus? #19

@karmalet

Description

@karmalet

Hi, may I ask what those "scope", "num" columns stand for?

In "idioms_pretrain.json" ,

idiom num explanation
偃武崇文 0 停息武备,崇尚文教。
洪乔捎书 0 指言而无信的人。
南郭先生 103 比喻无才而占据其位的人。

In "idioms_scopes.tsv",

scope idiom id
Scope I 见义勇为 0
Scope II 偃武崇文 3848
Scope III 亏于一篑 33237

In "idiom_synonyms.tsv",

query synonym query_id synonym_id overlapping
黯然销魂 六神无主 14726 1333 0
黯然销魂 丧魂失魄 14726 2704 1
塞翁失马,焉知非福 塞翁失马,安知非福 24524 32175 8

I thought "overlapping" is related with the number of Chinese character overlapped, but the last one shows 8, which is presumably 7.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions