feat(embed): multi-GPU finetune via shared nemo_runspec.execute_uv_local by oliverholworthy · Pull Request #165 · NVIDIA-NeMo/Nemotron

oliverholworthy · 2026-04-24T16:58:45Z

Summary

Adds shared local UV execution helpers in nemo_runspec.execution: execute_uv_local for locked stage-project execution and execute_uv_local_from_spec for runspec-aware launch handling.
Migrates embed eval, export, finetune, prep, and sdg local launch paths onto the shared helper, including conditional --extra tensorrt for export.
Updates embed finetune runspec metadata to launch = "torchrun" and gpus_per_node = "gpu" so local finetune launches through torch.distributed.run --nproc_per_node=gpu.
Pins Linux torch resolution for embed stage projects to the PyTorch cu129 index, including matching cu129 torch/torchvision wheels for export, and refreshes/adds the relevant UV lockfiles.
Moves temp pyproject generation into nemo_runspec._pyproject and reuses it from the container/Slurm run_uv.py wrapper so injected container excludes preserve optional deps, UV sources/indexes, and existing excludes.

Rationale

Local execution helpers belong with the existing execute_local machinery in nemo_runspec. Keeping the UV helper there lets embed and future recipe families share the same local execution path, while nemotron.kit.run_uv remains the container/Slurm wrapper and shares the pyproject-generation helper.

Test plan

uv run --extra dev pytest tests/nemo_runspec/test_execution_uv_spec.py tests/nemo_runspec/test_pyproject.py tests/recipes/embed/test_torch_sources.py tests/recipes/embed/test_cli.py
uv run --extra dev ruff check src/nemo_runspec/execution.py src/nemotron/kit/run_uv.py src/nemotron/recipes/embed/stage2_finetune/train.py tests/nemo_runspec/test_execution_uv_spec.py tests/nemo_runspec/test_pyproject.py tests/recipes/embed/test_torch_sources.py
python -m py_compile src/nemo_runspec/execution.py src/nemotron/kit/run_uv.py src/nemotron/recipes/embed/stage2_finetune/train.py tests/nemo_runspec/test_execution_uv_spec.py tests/nemo_runspec/test_pyproject.py tests/recipes/embed/test_torch_sources.py
git diff --check
nemotron embed finetune -c default on a single-GPU host.
nemotron embed finetune -c default on a multi-GPU host; confirm one worker per visible GPU.
nemotron embed {eval,export,prep,sdg} local smoke tests.
nemotron embed export ... export_to_trt=true still activates the TensorRT optional extra.
Remote/Slurm embed finetune smoke test with the intended env GPU settings; confirm torchrun launch and UV dependency sync.

shan-nvidia · 2026-05-08T17:03:22Z

@oliverholworthy Thanks for the fixes.
I tested fine-tune stage works fine with multi-gpus now. Eval stage also can leverage the gpus with 12.9 cuda.
Export stage is failing though. Here is the full log:

(nemotron) sthan@ipp1-3302:/raid/sthan/Nemotron$ CUDA_VISIBLE_DEVICES=1 nemotron embed export -c default

Compiled Configuration

╭──────────────────────────── run ─────────────────────────────╮
│ env:                                                         │
│   container: nvcr.io/nvidia/nemo:25.07                       │
│ mode: local                                                  │
│ profile: null                                                │
│ cli:                                                         │
│   argv:                                                      │
│   - /raid/sthan/Nemotron/.venv/bin/nemotron                  │
│   - embed                                                    │
│   - export                                                   │
│   - -c                                                       │
│   - default                                                  │
│   dotlist: []                                                │
│   passthrough: []                                            │
│   config: default                                            │
│ recipe:                                                      │
│   name: embed/export                                         │
│   script: src/nemotron/recipes/embed/stage4_export/export.py │
╰──────────────────────────────────────────────────────────────╯

╭───────────────────────────────────── config ─────────────────────────────────────╮
│ model_path: ./output/embed/stage2_finetune/checkpoints/LATEST/model/consolidated │
│ pooling_mode: avg                                                                │
│ normalize: true                                                                  │
│ attn_implementation: eager                                                       │
│ use_dimension_arg: true                                                          │
│ quant_cfg: null                                                                  │
│ calibration_batch_size: 64                                                       │
│ onnx_export_path: ./output/embed/stage4_export/onnx                              │
│ opset: 17                                                                        │
│ export_dtype: fp32                                                               │
│ export_to_trt: false                                                             │
│ trt_model_path: ./output/embed/stage4_export/tensorrt                            │
│ override_layernorm_precision_to_fp32: true                                       │
│ override_layers_to_fp32:                                                         │
│ - /model/norm/                                                                   │
│ - /pooling_module                                                                │
│ - /ReduceL2                                                                      │
│ - /Div                                                                           │
│ profiling_verbosity: layer_names_only                                            │
│ trt_min_batch: 1                                                                 │
│ trt_opt_batch: 16                                                                │
│ trt_max_batch: 64                                                                │
│ trt_min_seq_len: 3                                                               │
│ trt_opt_seq_len: 128                                                             │
│ trt_max_seq_len: 256                                                             │
│ output_dir: ./output/embed/stage4_export                                         │
╰──────────────────────────────────────────────────────────────────────────────────╯


wandb: Currently logged in as: sthan (nvidia-merlin) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin

╭────────────────────────────────────────────────────────────────────────────────────────────╮
│ Job Submission                                                                             │
│ ├── configs                                                                                │
│ │   ├── job:   /raid/sthan/Nemotron/.nemotron/jobs/20260508-095818-embed-export/job.yaml   │
│ │   └── train: /raid/sthan/Nemotron/.nemotron/jobs/20260508-095818-embed-export/train.yaml │
│ ├── env                                                                                    │
│ │   ├── HF_HOME: /raid/sthan/.cache/huggingface                                            │
│ │   └── WANDB_API_KEY: ✓ detected                                                          │
│ └── mode: local                                                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────╯

Executing with uv isolated environment: /raid/sthan/bin/uv run --with /raid/sthan/Nemotron --project /raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export python /raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py --config /raid/sthan/Nemotron/.nemotron/jobs/20260508-095818-embed-export/train.yaml
Uninstalled 1 package in 2ms
Installed 1 package in 22ms
    Updated https://github.com/NVIDIA-NeMo/Run.git (bfc53ac5af751982b119f0e6d59b53c53e81e86c)
Installed 148 packages in 143ms
🚀 Embedding Model Export to ONNX/TensorRT
============================================================
Model path:      output/embed/stage2_finetune/checkpoints/LATEST/model/consolidated
Pooling mode:    avg
Normalize:       True
Attention impl:  eager
Quantization:    None
ONNX output:     output/embed/stage4_export/onnx
Export to TRT:   False
============================================================

📦 Loading embedding model from: output/embed/stage2_finetune/checkpoints/LATEST/model/consolidated
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:61: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
[ERROR] `cache_position` is part of LlamaBidirectionalModel.forward's signature, but not documented. Make sure to add it to the docstring of the function in /raid/sthan/.cache/huggingface/modules/transformers_modules/consolidated/llama_bidirectional_model.py.
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 146/146 [00:00<00:00, 5632.42it/s]
   Model loaded successfully

📤 Exporting to ONNX...
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/megatron/core/distributed/fsdp/src/megatron_fsdp/utils.py:108: UserWarning: Transformer Engine and Apex are not installed. Falling back to local implementations of multi_tensor_applier and multi_tensor_scale
  warnings.warn(
fused_indices_to_multihot has reached end of life. Please migrate to a non-experimental function.
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/megatron/core/models/backends.py:21: UserWarning: Apex is not installed. Falling back to Torch Norm
  warnings.warn("Apex is not installed. Falling back to Torch Norm")
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/megatron/core/models/gpt/gpt_layer_specs.py:67: UserWarning: Apex is not installed. Falling back to Torch Norm
  warnings.warn("Apex is not installed. Falling back to Torch Norm")
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/modelopt/torch/__init__.py:36: UserWarning: transformers version 5.6.2 is not tested with nvidia-modelopt and may cause issues. Please install recommended version with `pip install nvidia-modelopt[hf]` if working with HF models.
  _warnings.warn(
PyTriton is not available.
  Exporting to ONNX (opset 17, dtype fp32)...
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py:239: DeprecationWarning: You are using the legacy TorchScript-based ONNX export. Starting in PyTorch 2.9, the new torch.export-based ONNX exporter has become the default. Learn more about the new export logic: https://docs.pytorch.org/docs/stable/onnx_export.html. For exporting control flow: https://pytorch.org/tutorials/beginner/onnx/export_control_flow_model_to_onnx_tutorial.html
  return original_export(*args, **kwargs)
[transformers] `cache_position` is deprecated as an arg, and will be removed in Transformers v5.6. Please use `q_length` and `q_offset` instead, similarly to `kv_length` and `kv_offset`
Traceback (most recent call last):
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 469, in <module>
    main()
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 465, in main
    return run_export(cfg)
           ^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 420, in run_export
    onnx_exporter = export_to_onnx(model, tokenizer, cfg)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 245, in export_to_onnx
    onnx_exporter.export(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/nemo_export/onnx_llm_exporter.py", line 194, in export
    self._export_to_onnx(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/nemo_export/onnx_llm_exporter.py", line 233, in _export_to_onnx
    torch.onnx.export(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 239, in forced_legacy_export
    return original_export(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/__init__.py", line 334, in export
    export(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 552, in export
    _export(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 1515, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 1113, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 997, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 903, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/jit/_trace.py", line 1439, in _get_trace_graph
    outs = ONNXTracedModule(
           ^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/jit/_trace.py", line 140, in forward
    graph, _out = torch._C._create_graph_by_tracing(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/jit/_trace.py", line 131, in wrapper
    outs.append(self.inner(*trace_inputs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1769, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/nemo_export/model_adapters/embedding/embedding_adapter.py", line 100, in forward
    outputs = self.model(**inputs)
              ^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1769, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/huggingface/modules/transformers_modules/consolidated/llama_bidirectional_model.py", line 259, in forward
    bidirectional_mask = self._create_bidirectional_mask(inputs_embeds, attention_mask)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/huggingface/modules/transformers_modules/consolidated/llama_bidirectional_model.py", line 212, in _create_bidirectional_mask
    return create_bidirectional_mask(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/uv/archive-v0/2tGhL6Zpeqn4nEqdhHy7E/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 171, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/uv/archive-v0/2tGhL6Zpeqn4nEqdhHy7E/lib/python3.12/site-packages/transformers/masking_utils.py", line 1071, in create_bidirectional_mask
    attention_mask = mask_interface(
                     ^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/uv/archive-v0/2tGhL6Zpeqn4nEqdhHy7E/lib/python3.12/site-packages/transformers/masking_utils.py", line 593, in eager_mask
    mask = sdpa_mask(
           ^^^^^^^^^^
  File "/raid/sthan/.cache/uv/archive-v0/2tGhL6Zpeqn4nEqdhHy7E/lib/python3.12/site-packages/transformers/masking_utils.py", line 492, in sdpa_mask
    q_length, q_offset = q_length.shape[0], q_length[0].to(device)
                         ~~~~~~~~~~~~~~^^^
IndexError: tuple index out of range

shan-nvidia · 2026-05-08T17:13:14Z

Also, export stage fails dependency setup because its pyproject.toml allows Python 3.14, but nvidia-resiliency-ext==0.5.0 only has cp312 wheels.

shan-nvidia

LGTM.

…ecution Add execute_uv_local alongside execute_local. When torch is already importable (e.g., inside an NVIDIA container), creates a venv with --system-site-packages and excludes torch from UV resolution. This avoids the CUDA version mismatch where UV's torch-backend=auto detects the kernel driver's CUDA version (via nvidia-smi) but the container's libcuda.so is a different version. When torch is NOT importable (bare machine), falls back to uv run --with torch with UV_TORCH_BACKEND=auto. Move _write_temp_pyproject into nemo_runspec._pyproject so both the new helper and nemotron.kit.run_uv (the remote/Slurm wrapper) share one implementation. Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

Replace the duplicated inline _execute_uv_local bodies in embed {eval,export,finetune,prep,sdg}.py with calls to the shared nemo_runspec.execution.execute_uv_local helper. No behavior change on the bare-machine path; gains container-torch / CUDA-mismatch handling automatically when torch is pre-installed. export.py passes the tensorrt extra via the new extras= kwarg instead of assembling --extra on the command line directly. Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

Use torch.distributed.run with --nproc_per_node=gpu so training automatically uses all available GPUs (works correctly with 1 GPU too). The local path goes through execute_uv_local's pre_script_args hook; the remote path is selected by the PEP 723 launch = "torchrun" header. Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

…ecution Add execute_uv_local alongside execute_local. When torch is already importable (e.g., inside an NVIDIA container), creates a venv with --system-site-packages and excludes torch from UV resolution. This avoids the CUDA version mismatch where UV's torch-backend detects the kernel driver's CUDA version but the container's libcuda.so is a different version. On bare machines, runs `uv run --project <stage>` against the stage's pyproject + uv.lock with no extra `--with torch`. If the stage declares mutually-exclusive cuXXX optional-dependencies (the standard UV multi-CUDA pattern from https://docs.astral.sh/uv/guides/integration/pytorch/), auto-detects the host's NVIDIA driver and passes `--extra cuXXX` so UV picks a torch wheel matching the driver. The driver detection logic and the driver→cuXXX table are ported from astral-sh/uv (MIT/Apache-2.0): - crates/uv-torch/src/accelerator.rs (detection order: env override, /sys/module/nvidia/version, /proc/driver/nvidia/version, nvidia-smi) - crates/uv-torch/src/backend.rs (LINUX_CUDA_DRIVERS table) UV_TORCH_BACKEND=auto is intentionally NOT set — it is honored by `uv pip`/`uv add`/`uv sync`, not by `uv run --with`/`uv run --project`, so it would be a no-op here (per Steve Han's investigation). Move _write_temp_pyproject into nemo_runspec._pyproject so both the new helper and nemotron.kit.run_uv (the remote/Slurm wrapper) share one implementation. Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

Add a [tool.uv.sources] override in each embed stage's pyproject.toml mapping torch to the pytorch-cu129 index on Linux. cu129 wheels are forward-compatible with cu13.x drivers (newer drivers run older toolkit binaries) and match the existing torch source declared by nemo-automodel (commit ecd7cb4), so the resolver doesn't fall through to the latest PyPI default (currently cu130) which fails on hosts with a CUDA 12.9 driver. Multi-extra (cu129/cu130) was attempted first but UV reported "conflicting indexes for package torch in all marker environments" because nemo-automodel pins Linux to pytorch-cu129 unconditionally and that mapping is honored even when nemo-automodel is a transitive git-installed dependency. Aligning with nemo-automodel's pattern is the simplest path that resolves cleanly. When nemo-automodel adopts multi-extra, this can be revisited. The CLI's CUDA-extra auto-detection (in nemo_runspec.execution._pick_cuda_extra) remains in place but is inert for these stages because no cu* extras are declared. NOTE: each stage's uv.lock needs to be regenerated: for s in stage0_sdg stage1_data_prep stage2_finetune stage3_eval stage4_export; do uv lock --project src/nemotron/recipes/embed/$s done Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

oliverholworthy force-pushed the oholworthy/embed-finetune-multi-gpu branch from c9f1fc8 to 45639dc Compare April 28, 2026 14:26

oliverholworthy self-assigned this May 8, 2026

oliverholworthy marked this pull request as ready for review May 8, 2026 13:41

oliverholworthy requested review from marcromeyn and shan-nvidia May 8, 2026 13:42

marcromeyn approved these changes May 13, 2026

View reviewed changes

shan-nvidia approved these changes May 15, 2026

View reviewed changes

oliverholworthy added 12 commits May 18, 2026 10:56

Pin embed torch wheels to cu129

74633d0

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

Remove CUDA extra auto-detection

ce7cccb

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

Use runspec launch metadata for embed local runs

c743d39

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

Simplify embed local UV torch handling

4ed8a08

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

Fix embed multi-GPU PR review issues

b5e0ea0

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

fix(embed): align export deps with finetune

e8ac599

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

fix(embed): pin finetune stage python

51ae0de

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>

oliverholworthy force-pushed the oholworthy/embed-finetune-multi-gpu branch from 013b3bb to 51ae0de Compare May 18, 2026 09:58

oliverholworthy merged commit de524b2 into main May 18, 2026
3 of 4 checks passed

oliverholworthy deleted the oholworthy/embed-finetune-multi-gpu branch May 18, 2026 10:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(embed): multi-GPU finetune via shared nemo_runspec.execute_uv_local#165

feat(embed): multi-GPU finetune via shared nemo_runspec.execute_uv_local#165
oliverholworthy merged 12 commits into
mainfrom
oholworthy/embed-finetune-multi-gpu

oliverholworthy commented Apr 24, 2026 •

edited by shan-nvidia

Loading

Uh oh!

shan-nvidia commented May 8, 2026

Uh oh!

shan-nvidia commented May 8, 2026

Uh oh!

shan-nvidia left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

oliverholworthy commented Apr 24, 2026 • edited by shan-nvidia Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Rationale

Test plan

Uh oh!

shan-nvidia commented May 8, 2026

Uh oh!

shan-nvidia commented May 8, 2026

Uh oh!

shan-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

oliverholworthy commented Apr 24, 2026 •

edited by shan-nvidia

Loading