How to handle unglossed words?

Quite often, people will not gloss words like person or place names or unparsable words, so some words may only be present in `Primary_Text`, but not in `Analyzed_Word` or `Gloss`. 

The most transparent way to store an example like that in CLDF is to have an empty list item in these two columns:

`Primary_Text`:  `"x y Person z"`
`Analyzed_Word`: `"x\ty\t\tz"` (`["x","y",None,"z"]` once read by pycldf)
`Gloss`: `"xg\tyg\t\tzg"` (`["xg","yg",None,"zg"]`)

This passes validation, but for example `cldf createdb` does not work (`TypeError: sequence item 1: expected str instance, NoneType found`) and I've been doing things like `ex["Analyzed_Word"] = ["" if x is None else x for x in ex["Analyzed_Word"]]` in `initializedb.py` scripts.

Should empty items in a gloss column raise an error upon validation? If yes, is the way to handle unglossed words to simply leave them out? (i.e. `"x\ty\tz"` `["x","y","z"]`)? Or, if empty items are allowed, would it be OK for pycldf to yield `""` instead of `None` (i.e. `"x\ty\t\tz"` `["x","y","","z"]`)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to handle unglossed words? #158

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to handle unglossed words? #158

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions