NVIDIA-NeMo · mikemckiernan · May 13, 2026
diff --git a/docs/index.md b/docs/index.md
@@ -127,6 +127,7 @@ Each recipe family has its own stage layout, and all of them can be tracked thro
 - [Pre-training Datasets](https://huggingface.co/collections/nvidia/nemotron-pre-training-datasets) – open pre-training data
 - [Post-training Datasets](https://huggingface.co/collections/nvidia/nemotron-post-training-v3) – SFT and RL data
 - [Artifact Lineage](nemotron/artifacts.md) – W&B integration guide
+- [Model training steps](train-models/index.md) – SFT, PEFT, RL, and optimization with `nemotron step run`
 
 ```{toctree}
 :caption: Usage Cookbook
@@ -178,6 +179,19 @@ nemotron/artifacts.md
 customize/index.md
 ```
 
+```{toctree}
+:caption: Model Training
+:hidden:
+
+About <train-models/index.md>
+Getting Started <train-models/getting-started.md>
+Tips for Using Agents <train-models/using-skill.md>
+Concepts <train-models/explanation/index.md>
+Tutorials <train-models/tutorials/index.md>
+Tasks <train-models/how-to/index.md>
+Reference <train-models/reference/index.md>
+```
+
 ```{toctree}
 :caption: Nano3 Stages
 :hidden:

diff --git a/docs/train-models/explanation/artifact-graph.md b/docs/train-models/explanation/artifact-graph.md
@@ -0,0 +1,56 @@
+# Artifact Graph
+
+Training steps declare typed inputs and outputs so pipelines can reason about compatibility without reading every Python file.
+
+## Where Types Are Defined
+
+The file `src/nemotron/steps/types.toml` lists artifact kinds such as `training_jsonl` and `checkpoint_megatron`. It gives short descriptions and optional `is_a` or `convert_to` edges. When a step manifest names a type in `[[consumes]]` or `[[produces]]`, that name must align with that graph, or you must insert an explicit conversion step between stages.
+
+For example, running `uv run nemotron step show sft/automodel` shows the consumes and produces information:
+
+```{code-block} text
+:emphasize-lines: 8-12
+
+────────────────────────────── sft/automodel — SFT Training (AutoModel) ──────────────────────────────
+/path/to/nemotron/src/nemotron/steps/sft/automodel
+
+Supervised fine-tuning with the AutoModel stack for HF-format models and JSONL
+datasets that already use OpenAI chat-format messages. Supports full SFT and
+LoRA-style adapter tuning from the same step.
+
+Consumes
+  • training_jsonl — Instruction data in JSONL with a messages field
+
+Produces
+  • checkpoint_hf — HuggingFace checkpoint directory (full model or adapter-style PEFT output)
+
+Parameters
+  • peft (default=null) — Use 'lora' for adapter tuning, or 'null' for full fine-tuning.
+
+Runspec
+  launcher: torchrun
+  image: -
+  resources: nodes=1 gpus_per_node=4
+  config dir: /path/to/nemotron/src/nemotron/steps/sft/automodel/config
+  default config: default
+```
+
+## Common Acyclic Paths
+
+Typical supervised paths include the following chains:
+
+- JSON Lines (JSONL) AutoModel line: `training_jsonl` → `sft/automodel` → `checkpoint_hf`
+- Packed Megatron line: `training_jsonl` → packing prep → `packed_parquet` → `sft/megatron_bridge` → `checkpoint_megatron`
+
+A typical alignment path starts from a `checkpoint_megatron` policy, adds preference or reward-side data, runs one of the `rl/nemo_rl/...` steps, and ends at `checkpoint_megatron`.
+
+A typical compression path starts from `checkpoint_hf`, runs `optimize/modelopt/quantize`, and lands at `checkpoint_megatron`. Add conversion after quantization when the next consumer needs a Hugging Face layout again.
+
+## Tokenizer and Template Lock
+
+Artifacts are not enough for correctness. Tokenizer, chat template, and sequence length must stay consistent across every step that tokenizes text or loads weights for the same model line. A mismatch often appears as plausible loss with poor downstream quality.
+
+## Related Reading
+
+- [Data and Checkpoint Formats](../how-to/data-and-checkpoint-formats.md)
+- [Training Stacks](training-stacks.md)
diff --git a/docs/train-models/explanation/index.md b/docs/train-models/explanation/index.md
@@ -0,0 +1,10 @@
+# Explanation
+
+This section holds conceptual material for training with Nemotron steps. It explains how artifacts relate, how training stacks differ, and how this documentation stays separate from training recipes.
+
+```{toctree}
+:maxdepth: 1
+
+artifact-graph
+training-stacks
+```
diff --git a/docs/train-models/explanation/training-stacks.md b/docs/train-models/explanation/training-stacks.md
@@ -0,0 +1,38 @@
+# Training Libraries
+
+Each step delegates to a concrete training or optimization library.
+The library decides how to load models, how to read data, and the native checkpoint layout.
+
+## NeMo AutoModel
+
+The steps `sft/automodel` and `peft/automodel` use the AutoModel library for training that centers on Hugging Face conventions.
+Data is usually OpenAI chat-formatted JSONL files that are read as-is.
+
+## Megatron-Bridge
+
+The steps `sft/megatron_bridge` and `peft/megatron_bridge` use Megatron Bridge for large distributed runs.
+These steps expect packed Apache Parquet for the packed pipeline.
+These steps emit Megatron checkpoints or Megatron-format adapters.
+
+## NeMo RL
+
+Steps under `rl/nemo_rl/` delegate to NeMo RL for alignment algorithms.
+They assume a Megatron-format policy checkpoint as the warm start.
+Reward handling differs per algorithm.
+Refer to [Choose an RL Alignment Step](../how-to/choose-rl-step.md) for selection guidance.
+
+## NVIDIA Model Optimizer
+
+Steps under `optimize/modelopt/` call Model Optimizer flows that are orchestrated next to Megatron Bridge conventions for export.
+Quantization targets reduced-precision inference.
+Pruning changes architecture.
+Distillation transfers quality from a teacher checkpoint to a student checkpoint.
+
+## Choosing a Library
+
+Use the how-to guides for supervised fine tuning (SFT), parameter-efficient fine tuning (PEFT), and reinforcement learning (RL) to map requirements to a stack. Treat stack choice as sticky. Crossing stacks implies conversion steps and new performance tuning, not a single configuration flag change.
+
+## Related Reading
+
+- [Choose an SFT Backend](../how-to/choose-sft-backend.md)
+- [Execution through NeMo Run](../../nemo_runspec/nemo-run.md)
diff --git a/docs/train-models/getting-started.md b/docs/train-models/getting-started.md
@@ -0,0 +1,92 @@
+# Getting Started with Training Steps
+
+This page walks through one supervised fine tuning (SFT) run on DGX Cloud Lepton using the *tiny* configuration.
+The tiny configuration lives in `src/nemotron/steps/sft/automodel/config/tiny.yaml` and is meant for short validation before you scale work.
+The goal is to validate wiring, NeMo Run, and your environment profile on real multi-node hardware.
+
+## Prerequisites
+
+- You need a clone of the Nemotron repository with dependencies installed.
+  Run `uv sync --all-extras` from the repository root if you have not installed dependencies yet.
+- You need access to DGX Cloud Lepton with GPU nodes.
+  This path assumes two nodes with eight A100 80 GB GPUs per node, matching the `run.env` block in `src/nemotron/steps/sft/automodel/config/tiny.yaml`.
+- You need to set the `HF_TOKEN` environment variable.
+- You ran `lep login` after syncronizing dependencies and are logged into Lepton.
+
+## Procedure
+
+1. Create an `env.toml` at the root of the repository like the following example:
+
+   ```{literalinclude} _snippets/input/env.toml
+   :language: toml
+   ```
+
+   Contact your cluster administrator for the values to substitute for the placeholders.
+
+1. View the step manifest and run specification:
+
+   ```console
+   $ uv run nemotron step show sft/automodel
+   ```
+
+   ````{dropdown} Example Output
+   :icon: code-square
+
+   ```{literalinclude} _snippets/output/gs-show.txt
+   ```
+   ````
+
+1. Compile the job against your Lepton profile without submitting it.
+   The profile name `lepton-sft` must match a table in your root `env.toml`.
+
+   ```console
+   $ uv run nemotron step run sft/automodel --config tiny --run lepton-sft --dry-run
+   ```
+
+   ````{dropdown} Partial Output
+   ```text
+   Compiled Configuration
+
+   ╭─────────────────────────────────────────── run ───────────────────────────────────────────╮
+   │ env:                                                                                      │
+   │   nodes: 2                                                                                │
+   │   gpus_per_node: 8                                                                        │
+   │   nprocs_per_node: 8                                                                      │
+   │   executor: lepton                                                                        │
+   │   container_image: nvcr.io/nvidia/nemo-automodel:26.04                                    │
+   │   node_group: az-sat-lepton-001                                                           │
+   │   resource_shape: gpu.8xa100-80gb                                                         │
+   │   remote_job_dir: /mnt/lustre-shared/user/nemotron/.nemotron-jobs
+   ...
+   ```
+   ````
+
+1. Submit the sample SFT job:
+
+   ```console
+   $ uv run nemotron step run sft/automodel -c tiny -r lepton-sft
+   ```
+
+The sample `tiny` config sets small training and validation splits.
+To specify the output path for checkpoints, set `SFT_OUTPUT_DIR` before running or specify the `checkpoint.checkpoint_dir` CLI override.
+
+## Discover Other Steps
+
+List step identifiers the CLI knows about:
+
+```console
+$ uv run nemotron step list
+```
+
+Other training stacks, for example Megatron Bridge SFT, PEFT, reinforcement learning (RL), or optimization, have their own `consumes` requirements. Use the [How-To Guides](how-to/index.md) and [Reference](reference/step-catalog.md) when you move past this first SFT path.
+
+## Success Checks
+
+- The command `nemotron step show <step_id>` lists `consumes` and `produces` artifact types. Those types must line up with your pipeline when you chain steps.
+- A finished sample run leaves logs and job metadata where NeMo Run is configured to write them. See [Execution through NeMo Run](../nemo_runspec/nemo-run.md) for experiment layout.
+- If you change tokenizer, template, or sequence length, keep them consistent across every step that touches the same model line. The [Artifact Graph](explanation/artifact-graph.md) page explains why consistency matters.
+
+## Next Steps
+
+- Follow [First SFT Run with AutoModel](tutorials/first-sft-automodel.md) when you need to point `tiny.yaml` at your own data or change the base model.
+- Read [Choose an SFT Backend](how-to/choose-sft-backend.md) when you need Megatron Bridge instead of AutoModel.
diff --git a/docs/train-models/how-to/choose-peft-backend.md b/docs/train-models/how-to/choose-peft-backend.md
@@ -0,0 +1,35 @@
+# Choose a PEFT Backend
+
+Parameter-efficient fine tuning (PEFT) in Nemotron is implemented as dedicated steps that emit adapter checkpoints. Pick the backend that matches your base checkpoint format and your data path.
+
+## Options
+
+| Step id | Best when | Primary inputs | Primary output artifact |
+|---------|------------|------------------|---------------------------|
+| `peft/automodel` | You have a Hugging Face base, chat-formatted JSON Lines (JSONL), and a small graphics processing unit (GPU) count | `training_jsonl` | `checkpoint_lora` |
+| `peft/megatron_bridge` | You have a Megatron base checkpoint and packed Apache Parquet at scale | `packed_parquet`, `checkpoint_megatron` | `checkpoint_lora` |
+
+## Decision Flow
+
+1. If you have one to four graphics processing units (GPUs) and JSON Lines (JSONL) chat data, use `peft/automodel`.
+2. If you have eight or more GPUs, you already run Megatron packing, and you train adapters on a Megatron base, use `peft/megatron_bridge`.
+3. If deployment requires a merged Hugging Face model, plan `convert/merge_lora` after training. Add any Megatron to Hugging Face conversion step that your pipeline needs before merge. Adapter evaluation scores are not identical to merged model scores.
+
+## Sample Commands
+
+```console
+$ uv run nemotron step run peft/automodel -c tiny
+$ uv run nemotron step run peft/megatron_bridge -c tiny
+```
+
+The Megatron Bridge path needs compatible packed Parquet and a base `checkpoint_megatron` path that you set in training configuration.
+
+## Success Criteria
+
+- You version adapter artifacts with base model id, data blend, rank, alpha, and target module set so you can reproduce runs.
+- You re-evaluate after merge when production uses merged weights.
+
+## Related Reading
+
+- [Choose an SFT Backend](choose-sft-backend.md)
+- [Data and Checkpoint Formats](data-and-checkpoint-formats.md)
diff --git a/docs/train-models/how-to/choose-rl-step.md b/docs/train-models/how-to/choose-rl-step.md
@@ -0,0 +1,42 @@
+# Choose an RL Alignment Step
+
+Post-training alignment with NeMo RL is split into three steps under `rl/nemo_rl/`. The table uses short names: direct preference optimization (DPO), reinforcement learning with verifiable rewards (RLVR) paired with group relative policy optimization (GRPO), and reinforcement learning from human feedback (RLHF). Choose a step based on how the reward signal enters training, not based on model family alone.
+
+## Options
+
+| Step id | Reward source | Typical data shape | Output |
+|---------|---------------|-------------------|--------|
+| `rl/nemo_rl/dpo` | Static preference pairs | Prompt with chosen and rejected completions | `checkpoint_megatron` |
+| `rl/nemo_rl/rlvr` | Verifiable or programmatic checks | Prompt with answers, tests, or environment metadata | `checkpoint_megatron` |
+| `rl/nemo_rl/rlhf` | Learned reward or judge model | Prompts plus a reward model checkpoint | `checkpoint_megatron` |
+
+All three steps consume a warm-start policy in `checkpoint_megatron` format produced by Megatron-style supervised fine tuning (SFT). They do not train a policy from scratch.
+
+## Decision Flow
+
+1. If you only have pairwise preferences and no online reward, use `rl/nemo_rl/dpo`.
+2. If reward is deterministic, for example unit tests, answer match, or tool success, use `rl/nemo_rl/rlvr`.
+3. If a separate reward model or judge produces scores, use `rl/nemo_rl/rlhf`.
+4. For resource-server rewards or NeMo Gym style rewards, use the RLVR or RLHF configuration paths documented in each step `SKILL.md` file and YAML file. Some flows use `config/nemo_gym.yaml`.
+
+## Data Preparation
+
+When preference JSONL still contains Hugging Face placeholders or needs sharding resolution, run the RL prep step upstream. Inspect `prep/rl_prep` in the step tree. Read the manifest for your chosen `rl/nemo_rl/...` step for required `consumes` types.
+
+## Sample Commands
+
+```console
+$ uv run nemotron step run rl/nemo_rl/dpo -c tiny
+$ uv run nemotron step run rl/nemo_rl/rlvr -c tiny
+$ uv run nemotron step run rl/nemo_rl/rlhf -c tiny
+```
+
+## Success Criteria
+
+- You validate reward design on a small batch before you scale rollout count.
+- You track Kullback–Leibler (KL) drift, reward variance, response length, and held-out task metrics. Average reward alone is not sufficient.
+
+## Related Reading
+
+- [Execution through NeMo Run](../../nemo_runspec/nemo-run.md) describes Ray-backed RL workloads on supported executors.
+- [Training Stacks](../explanation/training-stacks.md) places NeMo RL in the wider stack picture.
diff --git a/docs/train-models/how-to/choose-sft-backend.md b/docs/train-models/how-to/choose-sft-backend.md
@@ -0,0 +1,38 @@
+# Choose an SFT Backend
+
+Supervised fine tuning (SFT) is implemented by two interchangeable steps. Pick one step based on data format, checkpoint format, and scale.
+
+## Options
+
+| Step id | Best when | Primary input artifact | Primary output artifact |
+|---------|-----------|------------------------|-------------------------|
+| `sft/automodel` | You have OpenAI chat-formatted JSON Lines (JSONL), you want Hugging Face style checkpoints, or you want the smallest cluster footprint for iteration | `training_jsonl` | `checkpoint_hf` |
+| `sft/megatron_bridge` | You need distributed Megatron Bridge training with packed sequences and an Apache Parquet pipeline | `packed_parquet` | `checkpoint_megatron` |
+
+## Decision Flow
+
+1. If your data is already chat-formatted JSON Lines (JSONL) and downstream tools expect Hugging Face safetensors, start with `sft/automodel`.
+2. If your data is packed Parquet produced by the packing prep step, or you require Megatron distributed checkpoints without an export round trip, use `sft/megatron_bridge`.
+3. If you start on one backend and later need the other output format, plan an explicit conversion step in your pipeline. Do not switch backends silently without conversion.
+
+## Prerequisites for Megatron Bridge
+
+Megatron Bridge SFT expects packed Parquet that is compatible with the tokenizer and sequence length you will use in training. The pack size in prep must match the training sequence length. If they diverge, you risk shape errors mid-run.
+
+## Sample Commands
+
+```console
+$ uv run nemotron step run sft/automodel -c tiny
+$ uv run nemotron step run sft/megatron_bridge -c tiny
+```
+
+## Success Criteria
+
+- The commands `nemotron step show sft/automodel` and `nemotron step show sft/megatron_bridge` list the `consumes` types your workspace must provide.
+- Loss decreases on a small slice before you scale data or learning rate.
+- Tokenizer, chat template, and sequence length stay aligned with evaluation and with any later reinforcement learning (RL) step that reuses the policy.
+
+## Related Reading
+
+- [Data and Checkpoint Formats](data-and-checkpoint-formats.md)
+- [Artifact Graph](../explanation/artifact-graph.md)
diff --git a/docs/train-models/how-to/data-and-checkpoint-formats.md b/docs/train-models/how-to/data-and-checkpoint-formats.md
@@ -0,0 +1,35 @@
+# Data and Checkpoint Formats
+
+Training steps declare compatible artifacts in each `step.toml` file. This page summarizes the types that most training steps use so you can chain steps without format surprises.
+
+## Canonical Definitions
+
+Artifact names and compatibility rules live in `src/nemotron/steps/types.toml` at the repository root of the step library. Treat that file as the source of truth for names such as `training_jsonl`, `packed_parquet`, and `checkpoint_megatron`.
+
+## Frequently Used Types
+
+- The `training_jsonl` type means JSON Lines (JSONL) with a `messages` field in OpenAI chat shape. AutoModel supervised fine tuning (SFT) and parameter-efficient fine tuning (PEFT) consume it, and several reinforcement learning (RL) data paths consume it.
+- The `packed_parquet` type means packed shards with token columns and masks for Megatron Bridge style trainers. You get this type only after you run a packing prep step when you use the Parquet path.
+- The `checkpoint_hf` type means Hugging Face layout checkpoints or full weights on disk.
+- The `checkpoint_megatron` type means Megatron distributed checkpoints sharded across parallel ranks.
+- The `checkpoint_lora` type means low-rank adaptation (LoRA) adapter weights. Many downstream tools still need merge or export before deployment.
+
+## Chaining Guidance
+
+1. AutoModel SFT never consumes `packed_parquet`. Megatron Bridge SFT does not consume raw JSON Lines (JSONL) for the packed pipeline.
+2. RL steps in this repository expect a Megatron policy checkpoint for warm start. If your SFT used AutoModel, insert the appropriate conversion step before RL.
+3. Optimization steps that start from `checkpoint_hf` require a merged base when the trainable artifact was LoRA.
+
+## Where to Look in the Tree
+
+Each step directory contains the following files:
+
+- `step.toml` holds the identifier, human title, tags, `[[consumes]]`, `[[produces]]`, `[[parameters]]`, optional `[[strategies]]`, `[[errors]]`, and `[[models]]` blocks.
+- `config/default.yaml` holds primary configuration tuned for real workloads.
+- `config/tiny.yaml` holds reduced settings for short sample runs and plumbing validation.
+- Extra files such as `config/nemo_gym.yaml` appear only on steps that need alternate method profiles.
+
+## Related Reading
+
+- [Artifact Graph](../explanation/artifact-graph.md)
+- [Step Catalog (Training)](../reference/step-catalog.md)