Skip to content

feat(embed): multi-GPU finetune via shared nemo_runspec.execute_uv_local#165

Merged
oliverholworthy merged 12 commits into
mainfrom
oholworthy/embed-finetune-multi-gpu
May 18, 2026
Merged

feat(embed): multi-GPU finetune via shared nemo_runspec.execute_uv_local#165
oliverholworthy merged 12 commits into
mainfrom
oholworthy/embed-finetune-multi-gpu

Conversation

@oliverholworthy
Copy link
Copy Markdown
Contributor

@oliverholworthy oliverholworthy commented Apr 24, 2026

Summary

  • Adds shared local UV execution helpers in nemo_runspec.execution: execute_uv_local for locked stage-project execution and execute_uv_local_from_spec for runspec-aware launch handling.
  • Migrates embed eval, export, finetune, prep, and sdg local launch paths onto the shared helper, including conditional --extra tensorrt for export.
  • Updates embed finetune runspec metadata to launch = "torchrun" and gpus_per_node = "gpu" so local finetune launches through torch.distributed.run --nproc_per_node=gpu.
  • Pins Linux torch resolution for embed stage projects to the PyTorch cu129 index, including matching cu129 torch/torchvision wheels for export, and refreshes/adds the relevant UV lockfiles.
  • Moves temp pyproject generation into nemo_runspec._pyproject and reuses it from the container/Slurm run_uv.py wrapper so injected container excludes preserve optional deps, UV sources/indexes, and existing excludes.

Rationale

Local execution helpers belong with the existing execute_local machinery in nemo_runspec. Keeping the UV helper there lets embed and future recipe families share the same local execution path, while nemotron.kit.run_uv remains the container/Slurm wrapper and shares the pyproject-generation helper.

Test plan

  • uv run --extra dev pytest tests/nemo_runspec/test_execution_uv_spec.py tests/nemo_runspec/test_pyproject.py tests/recipes/embed/test_torch_sources.py tests/recipes/embed/test_cli.py
  • uv run --extra dev ruff check src/nemo_runspec/execution.py src/nemotron/kit/run_uv.py src/nemotron/recipes/embed/stage2_finetune/train.py tests/nemo_runspec/test_execution_uv_spec.py tests/nemo_runspec/test_pyproject.py tests/recipes/embed/test_torch_sources.py
  • python -m py_compile src/nemo_runspec/execution.py src/nemotron/kit/run_uv.py src/nemotron/recipes/embed/stage2_finetune/train.py tests/nemo_runspec/test_execution_uv_spec.py tests/nemo_runspec/test_pyproject.py tests/recipes/embed/test_torch_sources.py
  • git diff --check
  • nemotron embed finetune -c default on a single-GPU host.
  • nemotron embed finetune -c default on a multi-GPU host; confirm one worker per visible GPU.
  • nemotron embed {eval,export,prep,sdg} local smoke tests.
  • nemotron embed export ... export_to_trt=true still activates the TensorRT optional extra.
  • Remote/Slurm embed finetune smoke test with the intended env GPU settings; confirm torchrun launch and UV dependency sync.

@oliverholworthy oliverholworthy force-pushed the oholworthy/embed-finetune-multi-gpu branch from c9f1fc8 to 45639dc Compare April 28, 2026 14:26
@oliverholworthy oliverholworthy self-assigned this May 8, 2026
@oliverholworthy oliverholworthy marked this pull request as ready for review May 8, 2026 13:41
@shan-nvidia
Copy link
Copy Markdown
Contributor

@oliverholworthy Thanks for the fixes.
I tested fine-tune stage works fine with multi-gpus now. Eval stage also can leverage the gpus with 12.9 cuda.
Export stage is failing though. Here is the full log:

(nemotron) sthan@ipp1-3302:/raid/sthan/Nemotron$ CUDA_VISIBLE_DEVICES=1 nemotron embed export -c default

Compiled Configuration

╭──────────────────────────── run ─────────────────────────────╮
│ env:                                                         │
│   container: nvcr.io/nvidia/nemo:25.07                       │
│ mode: local                                                  │
│ profile: null                                                │
│ cli:                                                         │
│   argv:                                                      │
│   - /raid/sthan/Nemotron/.venv/bin/nemotron                  │
│   - embed                                                    │
│   - export                                                   │
│   - -c                                                       │
│   - default                                                  │
│   dotlist: []                                                │
│   passthrough: []                                            │
│   config: default                                            │
│ recipe:                                                      │
│   name: embed/export                                         │
│   script: src/nemotron/recipes/embed/stage4_export/export.py │
╰──────────────────────────────────────────────────────────────╯

╭───────────────────────────────────── config ─────────────────────────────────────╮
│ model_path: ./output/embed/stage2_finetune/checkpoints/LATEST/model/consolidated │
│ pooling_mode: avg                                                                │
│ normalize: true                                                                  │
│ attn_implementation: eager                                                       │
│ use_dimension_arg: true                                                          │
│ quant_cfg: null                                                                  │
│ calibration_batch_size: 64                                                       │
│ onnx_export_path: ./output/embed/stage4_export/onnx                              │
│ opset: 17                                                                        │
│ export_dtype: fp32                                                               │
│ export_to_trt: false                                                             │
│ trt_model_path: ./output/embed/stage4_export/tensorrt                            │
│ override_layernorm_precision_to_fp32: true                                       │
│ override_layers_to_fp32:                                                         │
│ - /model/norm/                                                                   │
│ - /pooling_module                                                                │
│ - /ReduceL2                                                                      │
│ - /Div                                                                           │
│ profiling_verbosity: layer_names_only                                            │
│ trt_min_batch: 1                                                                 │
│ trt_opt_batch: 16                                                                │
│ trt_max_batch: 64                                                                │
│ trt_min_seq_len: 3                                                               │
│ trt_opt_seq_len: 128                                                             │
│ trt_max_seq_len: 256                                                             │
│ output_dir: ./output/embed/stage4_export                                         │
╰──────────────────────────────────────────────────────────────────────────────────╯


wandb: Currently logged in as: sthan (nvidia-merlin) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin

╭────────────────────────────────────────────────────────────────────────────────────────────╮
│ Job Submission                                                                             │
│ ├── configs                                                                                │
│ │   ├── job:   /raid/sthan/Nemotron/.nemotron/jobs/20260508-095818-embed-export/job.yaml   │
│ │   └── train: /raid/sthan/Nemotron/.nemotron/jobs/20260508-095818-embed-export/train.yaml │
│ ├── env                                                                                    │
│ │   ├── HF_HOME: /raid/sthan/.cache/huggingface                                            │
│ │   └── WANDB_API_KEY: ✓ detected                                                          │
│ └── mode: local                                                                            │
╰────────────────────────────────────────────────────────────────────────────────────────────╯

Executing with uv isolated environment: /raid/sthan/bin/uv run --with /raid/sthan/Nemotron --project /raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export python /raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py --config /raid/sthan/Nemotron/.nemotron/jobs/20260508-095818-embed-export/train.yaml
Uninstalled 1 package in 2ms
Installed 1 package in 22ms
    Updated https://github.com/NVIDIA-NeMo/Run.git (bfc53ac5af751982b119f0e6d59b53c53e81e86c)
Installed 148 packages in 143ms
🚀 Embedding Model Export to ONNX/TensorRT
============================================================
Model path:      output/embed/stage2_finetune/checkpoints/LATEST/model/consolidated
Pooling mode:    avg
Normalize:       True
Attention impl:  eager
Quantization:    None
ONNX output:     output/embed/stage4_export/onnx
Export to TRT:   False
============================================================

📦 Loading embedding model from: output/embed/stage2_finetune/checkpoints/LATEST/model/consolidated
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/cuda/__init__.py:61: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml  # type: ignore[import]
[ERROR] `cache_position` is part of LlamaBidirectionalModel.forward's signature, but not documented. Make sure to add it to the docstring of the function in /raid/sthan/.cache/huggingface/modules/transformers_modules/consolidated/llama_bidirectional_model.py.
Loading weights: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████| 146/146 [00:00<00:00, 5632.42it/s]
   Model loaded successfully

📤 Exporting to ONNX...
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/megatron/core/distributed/fsdp/src/megatron_fsdp/utils.py:108: UserWarning: Transformer Engine and Apex are not installed. Falling back to local implementations of multi_tensor_applier and multi_tensor_scale
  warnings.warn(
fused_indices_to_multihot has reached end of life. Please migrate to a non-experimental function.
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/megatron/core/models/backends.py:21: UserWarning: Apex is not installed. Falling back to Torch Norm
  warnings.warn("Apex is not installed. Falling back to Torch Norm")
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/megatron/core/models/gpt/gpt_layer_specs.py:67: UserWarning: Apex is not installed. Falling back to Torch Norm
  warnings.warn("Apex is not installed. Falling back to Torch Norm")
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/modelopt/torch/__init__.py:36: UserWarning: transformers version 5.6.2 is not tested with nvidia-modelopt and may cause issues. Please install recommended version with `pip install nvidia-modelopt[hf]` if working with HF models.
  _warnings.warn(
PyTriton is not available.
  Exporting to ONNX (opset 17, dtype fp32)...
/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py:239: DeprecationWarning: You are using the legacy TorchScript-based ONNX export. Starting in PyTorch 2.9, the new torch.export-based ONNX exporter has become the default. Learn more about the new export logic: https://docs.pytorch.org/docs/stable/onnx_export.html. For exporting control flow: https://pytorch.org/tutorials/beginner/onnx/export_control_flow_model_to_onnx_tutorial.html
  return original_export(*args, **kwargs)
[transformers] `cache_position` is deprecated as an arg, and will be removed in Transformers v5.6. Please use `q_length` and `q_offset` instead, similarly to `kv_length` and `kv_offset`
Traceback (most recent call last):
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 469, in <module>
    main()
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 465, in main
    return run_export(cfg)
           ^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 420, in run_export
    onnx_exporter = export_to_onnx(model, tokenizer, cfg)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 245, in export_to_onnx
    onnx_exporter.export(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/nemo_export/onnx_llm_exporter.py", line 194, in export
    self._export_to_onnx(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/nemo_export/onnx_llm_exporter.py", line 233, in _export_to_onnx
    torch.onnx.export(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/export.py", line 239, in forced_legacy_export
    return original_export(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/__init__.py", line 334, in export
    export(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 552, in export
    _export(
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 1515, in _export
    graph, params_dict, torch_out = _model_to_graph(
                                    ^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 1113, in _model_to_graph
    graph, params, torch_out, module = _create_jit_graph(model, args)
                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 997, in _create_jit_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/onnx/_internal/torchscript_exporter/utils.py", line 903, in _trace_and_get_graph_from_model
    trace_graph, torch_out, inputs_states = torch.jit._get_trace_graph(
                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1263, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/jit/_trace.py", line 1439, in _get_trace_graph
    outs = ONNXTracedModule(
           ^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/jit/_trace.py", line 140, in forward
    graph, _out = torch._C._create_graph_by_tracing(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/jit/_trace.py", line 131, in wrapper
    outs.append(self.inner(*trace_inputs))
                ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1769, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/nemo_export/model_adapters/embedding/embedding_adapter.py", line 100, in forward
    outputs = self.model(**inputs)
              ^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/Nemotron/src/nemotron/recipes/embed/stage4_export/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1769, in _slow_forward
    result = self.forward(*input, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/huggingface/modules/transformers_modules/consolidated/llama_bidirectional_model.py", line 259, in forward
    bidirectional_mask = self._create_bidirectional_mask(inputs_embeds, attention_mask)
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/huggingface/modules/transformers_modules/consolidated/llama_bidirectional_model.py", line 212, in _create_bidirectional_mask
    return create_bidirectional_mask(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/uv/archive-v0/2tGhL6Zpeqn4nEqdhHy7E/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 171, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/uv/archive-v0/2tGhL6Zpeqn4nEqdhHy7E/lib/python3.12/site-packages/transformers/masking_utils.py", line 1071, in create_bidirectional_mask
    attention_mask = mask_interface(
                     ^^^^^^^^^^^^^^^
  File "/raid/sthan/.cache/uv/archive-v0/2tGhL6Zpeqn4nEqdhHy7E/lib/python3.12/site-packages/transformers/masking_utils.py", line 593, in eager_mask
    mask = sdpa_mask(
           ^^^^^^^^^^
  File "/raid/sthan/.cache/uv/archive-v0/2tGhL6Zpeqn4nEqdhHy7E/lib/python3.12/site-packages/transformers/masking_utils.py", line 492, in sdpa_mask
    q_length, q_offset = q_length.shape[0], q_length[0].to(device)
                         ~~~~~~~~~~~~~~^^^
IndexError: tuple index out of range

@shan-nvidia
Copy link
Copy Markdown
Contributor

Also, export stage fails dependency setup because its pyproject.toml allows Python 3.14, but nvidia-resiliency-ext==0.5.0 only has cp312 wheels.

Copy link
Copy Markdown
Contributor

@shan-nvidia shan-nvidia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

…ecution

Add execute_uv_local alongside execute_local. When torch is already
importable (e.g., inside an NVIDIA container), creates a venv with
--system-site-packages and excludes torch from UV resolution. This
avoids the CUDA version mismatch where UV's torch-backend=auto detects
the kernel driver's CUDA version (via nvidia-smi) but the container's
libcuda.so is a different version.

When torch is NOT importable (bare machine), falls back to
uv run --with torch with UV_TORCH_BACKEND=auto.

Move _write_temp_pyproject into nemo_runspec._pyproject so both the
new helper and nemotron.kit.run_uv (the remote/Slurm wrapper) share
one implementation.

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Replace the duplicated inline _execute_uv_local bodies in embed
{eval,export,finetune,prep,sdg}.py with calls to the shared
nemo_runspec.execution.execute_uv_local helper. No behavior change
on the bare-machine path; gains container-torch / CUDA-mismatch
handling automatically when torch is pre-installed.

export.py passes the tensorrt extra via the new extras= kwarg instead
of assembling --extra on the command line directly.

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Use torch.distributed.run with --nproc_per_node=gpu so training
automatically uses all available GPUs (works correctly with 1 GPU too).

The local path goes through execute_uv_local's pre_script_args hook;
the remote path is selected by the PEP 723 launch = "torchrun" header.

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
…ecution

Add execute_uv_local alongside execute_local. When torch is already
importable (e.g., inside an NVIDIA container), creates a venv with
--system-site-packages and excludes torch from UV resolution. This
avoids the CUDA version mismatch where UV's torch-backend detects the
kernel driver's CUDA version but the container's libcuda.so is a
different version.

On bare machines, runs `uv run --project <stage>` against the stage's
pyproject + uv.lock with no extra `--with torch`. If the stage declares
mutually-exclusive cuXXX optional-dependencies (the standard UV
multi-CUDA pattern from
https://docs.astral.sh/uv/guides/integration/pytorch/), auto-detects
the host's NVIDIA driver and passes `--extra cuXXX` so UV picks a
torch wheel matching the driver.

The driver detection logic and the driver→cuXXX table are ported from
astral-sh/uv (MIT/Apache-2.0):
  - crates/uv-torch/src/accelerator.rs (detection order: env override,
    /sys/module/nvidia/version, /proc/driver/nvidia/version, nvidia-smi)
  - crates/uv-torch/src/backend.rs (LINUX_CUDA_DRIVERS table)

UV_TORCH_BACKEND=auto is intentionally NOT set — it is honored by
`uv pip`/`uv add`/`uv sync`, not by `uv run --with`/`uv run --project`,
so it would be a no-op here (per Steve Han's investigation).

Move _write_temp_pyproject into nemo_runspec._pyproject so both the
new helper and nemotron.kit.run_uv (the remote/Slurm wrapper) share
one implementation.

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Add a [tool.uv.sources] override in each embed stage's pyproject.toml
mapping torch to the pytorch-cu129 index on Linux. cu129 wheels are
forward-compatible with cu13.x drivers (newer drivers run older toolkit
binaries) and match the existing torch source declared by nemo-automodel
(commit ecd7cb4), so the resolver doesn't fall through to the latest
PyPI default (currently cu130) which fails on hosts with a CUDA 12.9
driver.

Multi-extra (cu129/cu130) was attempted first but UV reported
"conflicting indexes for package torch in all marker environments"
because nemo-automodel pins Linux to pytorch-cu129 unconditionally and
that mapping is honored even when nemo-automodel is a transitive
git-installed dependency. Aligning with nemo-automodel's pattern is the
simplest path that resolves cleanly. When nemo-automodel adopts
multi-extra, this can be revisited.

The CLI's CUDA-extra auto-detection (in
nemo_runspec.execution._pick_cuda_extra) remains in place but is inert
for these stages because no cu* extras are declared.

NOTE: each stage's uv.lock needs to be regenerated:

  for s in stage0_sdg stage1_data_prep stage2_finetune stage3_eval stage4_export; do
    uv lock --project src/nemotron/recipes/embed/$s
  done

Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
Signed-off-by: Oliver Holworthy <1216955+oliverholworthy@users.noreply.github.com>
@oliverholworthy oliverholworthy force-pushed the oholworthy/embed-finetune-multi-gpu branch from 013b3bb to 51ae0de Compare May 18, 2026 09:58
@oliverholworthy oliverholworthy merged commit de524b2 into main May 18, 2026
3 of 4 checks passed
@oliverholworthy oliverholworthy deleted the oholworthy/embed-finetune-multi-gpu branch May 18, 2026 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants