Skip to content

Update for CLICS4 #29

@LinguList

Description

@LinguList

For clics4, we will have some 52 datasets, all segmented and therefore analyzable with LingPy cognate detection methods. This means, we can offer enhanced networks (which require to integrate code that has been written but not yet for pyclics):

  1. code for the identification of cognates among colexifications in the same family (https://github.com/clics/clicsbp/blob/fd571023865366e5be654d6ff05f1f36dcba1272/clicsbpcommands/colexifications.py#L173-L217
  2. code for the computation of weights using random walks (this will increase the paths among concepts through neighbors and could be useful for semantic metrics in the future, but it is not clear how feasible it is to run it on all data: https://github.com/clics/clicsbp/blob/fd571023865366e5be654d6ff05f1f36dcba1272/clicsbpcommands/colexifications.py#L127-L167

Given that we were asked for certain aspects regarding the CLICS data, where the data online is different from the data we report in concepticon (e.g., weighted degree, etc.), it would this time also be good to compute the concepticon table (or norare-table) directly when computing clics, so we have a concrete reference, and no hidden script that runs on one's computer and is not officially shared. So, when doing the colexification search, we should additionally:

  1. computes statistics (weighted degree, degree)
  2. run the subgraph method, which is now directly run in CLLD also in the Python code, to determine the sub-graphs

All in all, this is SOME work to be done.

To explain the sub-graph issue: we had some users asking why data on the website is different from the data in the concepticon version of CLICS3 (Rzymski-2020-XXXX list).

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions