Skip to content

Dataset processing and download. #1

@fissoreg

Description

@fissoreg

The dataset currently in use is a stripped-down version of the Kaggle arXiv Dataset in which only the following categories are retained: cs.AI, cs.CL, cs.CV, cs.LG, cs.MA, cs.NE.

We should self-host this dataset, provide the scripts to process it, and keep it up-to-date with the original ArXiv.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions