text-to-pose: Use more datasets

The [`sign-language-datasets`](https://github.com/sign-language-processing/datasets) package includes datasets that could be used for this task:

- [x] dicta sign (dictionary - hamnosys / pose)
- [ ] sign2mint (dictionary - signwriting / videos. poses can be easily extracted)
- [ ] signtyp (signwriting / videos) - quite noisy, I don't think it is worth using.
- [ ] swojs_glossario (signwiriting / videos) - very high quality, but long sentences rather than single words. I don't think it is worth using.


There are also more datasets, not currently available in `sign-language-datasets`, that would probably be good to add:

- [ ] [dgs lexicon](https://www.sign-lang.uni-hamburg.de/meinedgs/ling/types_en.html) (dictionary - hamnosys / videos) - The DGS Corpus includes a lexicon for all the glosses. When clicking on a gloss (skip all $s for ease) then we have [this](https://www.sign-lang.uni-hamburg.de/meinedgs/types/type28940_en.html) for example which includes hamnosys and a video. [Sometimes](https://www.sign-lang.uni-hamburg.de/meinedgs/types/type16109_en.html), only hamnosys exists, but with references in the DGS corpus when this hamnosys was performed. This is very powerful to get multiple almost-aligned examples for the same sequence (could probably extract 50-200k+ samples), in a non-dictionary manner
- [ ] [The polish sign language corpus](https://www.slownikpjm.uw.edu.pl/en/list) includes many dictionary videos ([example](https://www.slownikpjm.uw.edu.pl/en/gloss/view/1926)) with hamnosys


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text-to-pose: Use more datasets #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

text-to-pose: Use more datasets #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions