-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Quite often, people will not gloss words like person or place names or unparsable words, so some words may only be present in Primary_Text, but not in Analyzed_Word or Gloss.
The most transparent way to store an example like that in CLDF is to have an empty list item in these two columns:
Primary_Text: "x y Person z"
Analyzed_Word: "x\ty\t\tz" (["x","y",None,"z"] once read by pycldf)
Gloss: "xg\tyg\t\tzg" (["xg","yg",None,"zg"])
This passes validation, but for example cldf createdb does not work (TypeError: sequence item 1: expected str instance, NoneType found) and I've been doing things like ex["Analyzed_Word"] = ["" if x is None else x for x in ex["Analyzed_Word"]] in initializedb.py scripts.
Should empty items in a gloss column raise an error upon validation? If yes, is the way to handle unglossed words to simply leave them out? (i.e. "x\ty\tz" ["x","y","z"])? Or, if empty items are allowed, would it be OK for pycldf to yield "" instead of None (i.e. "x\ty\t\tz" ["x","y","","z"])?