Skip to content
This repository was archived by the owner on May 10, 2026. It is now read-only.
This repository was archived by the owner on May 10, 2026. It is now read-only.

text-to-pose: Use more datasets #3

@AmitMY

Description

@AmitMY

The sign-language-datasets package includes datasets that could be used for this task:

  • dicta sign (dictionary - hamnosys / pose)
  • sign2mint (dictionary - signwriting / videos. poses can be easily extracted)
  • signtyp (signwriting / videos) - quite noisy, I don't think it is worth using.
  • swojs_glossario (signwiriting / videos) - very high quality, but long sentences rather than single words. I don't think it is worth using.

There are also more datasets, not currently available in sign-language-datasets, that would probably be good to add:

  • dgs lexicon (dictionary - hamnosys / videos) - The DGS Corpus includes a lexicon for all the glosses. When clicking on a gloss (skip all $s for ease) then we have this for example which includes hamnosys and a video. Sometimes, only hamnosys exists, but with references in the DGS corpus when this hamnosys was performed. This is very powerful to get multiple almost-aligned examples for the same sequence (could probably extract 50-200k+ samples), in a non-dictionary manner
  • The polish sign language corpus includes many dictionary videos (example) with hamnosys

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions