Skip to content

nemotron-cc-code in super3 pretraining data recipe should be publicly available #146

@verdimrc

Description

@verdimrc

The .json files https://github.com/NVIDIA-NeMo/Nemotron/blob/main/src/nemotron/recipes/super3/stage0_pretrain/config/data_prep/ still lists nemotron-cc-code under _missing_categories.

If nemotron-cc-code is the same as https://huggingface.co/datasets/nvidia/Nemotron-CC-Code-v1, should these .json files be fixed?

  • data_blend_raw_ong_context.json
  • data_blend_raw_phase1.json
  • data_blend_raw_phase2.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions