Skip to content

ATIS data split (train,dev,test) issues #8

@kpe

Description

@kpe

I noticed some issues with the data split in the ATIS dataset (see visualization of the label distributions here.):

  • duplicated data samples - 397 (from 5871)
  • no train data for 5 intent and 7 slot labels
  • up to 20% of the labels present in the training dataset are not present at all in the dev or test datasets

I believe this could make the interpretation of model performance measures somewhat unreliable, and tried to build an alternative, more balanced
data split here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions