-
Notifications
You must be signed in to change notification settings - Fork 0
Description
For clics4, we will have some 52 datasets, all segmented and therefore analyzable with LingPy cognate detection methods. This means, we can offer enhanced networks (which require to integrate code that has been written but not yet for pyclics):
- code for the identification of cognates among colexifications in the same family (https://github.com/clics/clicsbp/blob/fd571023865366e5be654d6ff05f1f36dcba1272/clicsbpcommands/colexifications.py#L173-L217
- code for the computation of weights using random walks (this will increase the paths among concepts through neighbors and could be useful for semantic metrics in the future, but it is not clear how feasible it is to run it on all data: https://github.com/clics/clicsbp/blob/fd571023865366e5be654d6ff05f1f36dcba1272/clicsbpcommands/colexifications.py#L127-L167
Given that we were asked for certain aspects regarding the CLICS data, where the data online is different from the data we report in concepticon (e.g., weighted degree, etc.), it would this time also be good to compute the concepticon table (or norare-table) directly when computing clics, so we have a concrete reference, and no hidden script that runs on one's computer and is not officially shared. So, when doing the colexification search, we should additionally:
- computes statistics (weighted degree, degree)
- run the subgraph method, which is now directly run in CLLD also in the Python code, to determine the sub-graphs
All in all, this is SOME work to be done.
To explain the sub-graph issue: we had some users asking why data on the website is different from the data in the concepticon version of CLICS3 (Rzymski-2020-XXXX list).