The .json files https://github.com/NVIDIA-NeMo/Nemotron/blob/main/src/nemotron/recipes/super3/stage0_pretrain/config/data_prep/ still lists nemotron-cc-code under _missing_categories.
If nemotron-cc-code is the same as https://huggingface.co/datasets/nvidia/Nemotron-CC-Code-v1, should these .json files be fixed?
data_blend_raw_ong_context.json
data_blend_raw_phase1.json
data_blend_raw_phase2.json
The
.jsonfiles https://github.com/NVIDIA-NeMo/Nemotron/blob/main/src/nemotron/recipes/super3/stage0_pretrain/config/data_prep/ still listsnemotron-cc-codeunder_missing_categories.If
nemotron-cc-codeis the same as https://huggingface.co/datasets/nvidia/Nemotron-CC-Code-v1, should these.jsonfiles be fixed?data_blend_raw_ong_context.jsondata_blend_raw_phase1.jsondata_blend_raw_phase2.json