Skip to content

Example#1

Closed
zyzshishui wants to merge 4 commits into
mainfrom
p2prdma
Closed

Example#1
zyzshishui wants to merge 4 commits into
mainfrom
p2prdma

Conversation

@zyzshishui
Copy link
Copy Markdown
Owner

No description provided.

@zyzshishui zyzshishui closed this Apr 14, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 62b440b22a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +202 to +206
case "$1" in
head) run_head ;;
worker) run_worker ;;
submit) run_submit ;;
*) echo "Unknown command: $1"; exit 1 ;;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Implement print-ip command in 30B launcher script

scripts/launch-p2p-rdma.sh always probes node IPs by running ${TRAIN_SCRIPT} print-ip in get_container_ip, but this script only handles head|worker|submit, so selecting TRAIN_SCRIPT=scripts/run-qwen3-30B-A3B-p2p.sh makes start_ray fail with Unknown command: print-ip before cluster startup. Add a print-ip branch (as done in the 235B script) or make the launcher use a fallback probe for scripts without that subcommand.

Useful? React with 👍 / 👎.

--update-weight-transfer-mode p2p
--update-weight-buffer-size "$((1024 * 1024 * 1024))"
--check-weight-update-equal
"${extra_mismatch_args[@]}"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard optional extra_mismatch_args under nounset

With set -euo pipefail enabled, expanding "${extra_mismatch_args[@]}" aborts with an unbound-variable error unless that array is defined, and this name is not initialized in this script or in the sourced scripts/models/qwen3-30B-A3B.sh; as a result, the default submit path fails before ray job submit. Use a safe expansion (for example with a default-empty array) or initialize it explicitly.

Useful? React with 👍 / 👎.

-e HSA_FORCE_FINE_GRAIN_PCIE=1 \
-e HSA_NO_SCRATCH_RECLAIM=1 \
-e HF_HOME=/root/.cache/huggingface \
-e WANDB_KEY=cd411df8b73eb3f5c1ae1220cc1ec4e3c9d1f86e \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove hardcoded WANDB API key from launcher

This commit embeds a concrete Weights & Biases API key directly in the script and injects it into every container, which is a credential leak and allows unintended writes against that account by anyone who can read or run the script. Load the key from a secure external source (environment/secret store) instead of committing it in plaintext.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant