Skip to content

ensure that CNN vocabulary in the embedding layer covers all relevant words #86

@Tilana

Description

@Tilana

For CNN classification an embedding layer with size [len(vocabulary), embedding_size)] is created.
If the training data only contains a few sentences the corresponding vocabulary is very small and new words appearing only in the test set will not be considered...
Therefore, either provide a pre-defined/trained vocabulary from general HR documents, or us the corresponding documents of a dataset to collect relevant words independent of training/testing data.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions