-
Notifications
You must be signed in to change notification settings - Fork 264
[Bug] Agentic Training Failed with "rock: command not found" #412
Copy link
Copy link
Open
Description
I am trying to use ROLL for RL training. I saw the following error message after model service is installed. It is strange because the model service has been installed and started.
This is my config:
defaults:
- ../config/traj_envs@_here_
- ../config/deepspeed_zero@_here_
- ../config/deepspeed_zero2@_here_
- ../config/deepspeed_zero3@_here_
- ../config/deepspeed_zero3_cpuoffload@_here_
hydra:
run:
dir: .
output_subdir: null
exp_name: "agentic_rollout_swe"
seed: 42
logging_dir: ./output/logs
output_dir: ./output
model_name: ${exp_name}-${now:%Y%m%d_%H%M%S}
rollout_dump_dir: ./output/rollout_dump
system_envs:
USE_MODELSCOPE: '1'
num_gpus_per_node: 8 # change this
rpc_timeout: 72000
max_steps: 10
save_steps: 10
logging_steps: 1
eval_steps: 2
resume_from_checkpoint: false
rollout_batch_size: 1
val_batch_size: 1
sequence_length: 65536
max_tokens_per_step: 4096
advantage_clip: 0.2
ppo_epochs: 1
adv_estimator: "step_reinforce"
batch_adjust_mode: "random_sample"
step_reward_gamma: 1.0
#pg_clip: 0.1
#dual_clip_loss: True
init_kl_coef: 0.0
whiten_advantages: true
entropy_loss_coef: 0
max_grad_norm: 1.0
pretrain: /var/model/Qwen2.5-7B-Instruct # change this
reward_pretrain: /var/model/Qwen2.5-7B-Instruct # change this
actor_train:
model_args:
flash_attn: fa2
disable_gradient_checkpointing: false
dtype: bf16
model_type: ~
actor_infer:
model_args:
flash_attn: fa2
disable_gradient_checkpointing: true
dtype: bf16
generating_args:
max_new_tokens: ${max_tokens_per_step} # single-turn response length
top_p: 1.0
top_k: 50
num_beams: 1
temperature: 1.0
num_return_sequences: 1
stop_strings: ["</tool_call>","</tool_call>\n","\n</tool_call>\n","\n</function>"]
include_stop_str_in_output: true
data_args:
template: qwen3_coder
strategy_args:
strategy_name: vllm
strategy_config:
gpu_memory_utilization: 0.8
block_size: 16
load_format: auto
tensor_parallel_size: 1
device_mapping: list(range(1,2))
reward_normalization:
grouping: traj_group_id # tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by reward/adv
method: mean
# norm_mean_type: batch
# norm_std_type: group
train_env_manager:
max_env_num_per_worker: 1
num_env_groups: 1
# under the same group, the env config and env seed are ensured to be equal
group_size: 1
tags: [swebench_native_verified]
num_groups_partition: [1] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation
system_envs:
# if you cannot get python env in rock due to connetion error, try to use this, may expire in the future
ROCK_RTENV_PYTHON_V31114_INSTALL_CMD: '[ -f cpython31115.tar.gz ] && rm cpython31115.tar.gz; [ -d python ] && rm -rf python; wget -q -O cpython31115.tar.gz https://mirror.nju.edu.cn/github-release/astral-sh/python-build-standalone/20260303/cpython-3.11.15+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz && tar -xzf cpython31115.tar.gz && mv python runtime-env'
val_env_manager:
max_env_num_per_worker: 1
num_env_groups: 1
group_size: 1 # should be set to 1 because val temperature is set to 0 and same prompt leads to same output
tags: [swebench_native_verified]
num_groups_partition: [1] # TODO: If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation
system_envs:
# if you cannot get python env in rock due to connetion error, try to use this, may expire in the future
ROCK_RTENV_PYTHON_V31114_INSTALL_CMD: '[ -f cpython31115.tar.gz ] && rm cpython31115.tar.gz; [ -d python ] && rm -rf python; wget -q -O cpython31115.tar.gz https://mirror.nju.edu.cn/github-release/astral-sh/python-build-standalone/20260303/cpython-3.11.15+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz && tar -xzf cpython31115.tar.gz && mv python runtime-env'
max_actions_per_traj: 60
env_manager_cls: roll.pipeline.agentic.env_manager.agent_native_env_manager.AgentNativeStepEnvManager
agent_config_common:
agent_type: "default"
# Startup command; placeholders (e.g., <<PROMPT>>) are parsed in the code
run_cmd: 'iflow -p <<PROMPT>> --yolo'
# Dependency pre-installation; modify based on your sandbox image
pre_init_cmds:
- command: "apt-get update"
timeout_seconds: 600
- command: "apt-get install -y curl git wget xz-utils"
timeout_seconds: 600
- command: "apt-get install -y build-essential libc6-dev patch procps npm"
timeout_seconds: 600
# Install helper tools like 'uv'
- command: "wget -q https://xrl-sandbox-bucket.oss-cn-hangzhou.aliyuncs.com/uv-files/uv-x86_64-unknown-linux-gnu.tar.gz && tar -xzf uv-x86_64-unknown-linux-gnu.tar.gz --strip-components=1 -C /usr/local/bin && uv --version"
timeout_seconds: 600
model_service_config:
type: "local"
enabled: True
# 运行时环境
runtime_env_config:
type: node
npm_registry: "https://registry.npmmirror.com"
# Install specific iflow versions as needed
custom_install_cmd: "wget --retry-connrefused --tries=10 --waitretry=2 -O ~/iflow-cli.tgz 'http://cloud.iflow.cn/iflow-cli/iflow-ai-iflow-cli-for-roll-0-4-4-v5.tgz' && npm i -g ~/iflow-cli.tgz"
env:
# Configure iflow parameters as needed
IFLOW_apiKey: "test"
IFLOW_baseUrl: "http://localhost:8080/v1"
IFLOW_modelName: "ROME"
IFLOW_searchApiKey: "88888888"
IFLOW_selectedAuthType: "openai-compatible"
IFLOW_disableAutoUpdate: "true"
IFLOW_tokensLimit: "128000"
IFLOW_shellTimeout: "360000"
IFLOW_coreTools: "Edit,exit_plan_mode,glob,list_directory,multi_edit,plan,read plan,read_file,read_many_files,save_memory,Search,Shell,task,web_fetch,web_search,write_file,xml_escape"
custom_envs:
swebench_native_verified:
env_type: "rock_tb_native_env"
max_steps: ${max_actions_per_traj}
max_tokens_per_step: ${max_tokens_per_step}
env_manager_cls: ${env_manager_cls}
agent_system_template: "agent_system_template placeholder"
agent_template: "agent_template placeholder"
env_config:
dataset_name: /workspace/ROLL/data/swe_bench_verified_example.jsonl # change to your owen data path
tools: ~
max_steps: ${max_actions_per_traj}
mode: "val"
sandbox_base_url: http://rock:8080 # change to your own service address if needed
user_id: "xxx"
experiment_id: "test_tb_native"
test_files: ["/var/model-dataset/terminal-bench-datasets/datasets/swebench-verified/"]
agent_config: ${agent_config_common}
This is the script:
#!/bin/bash
set +x
CONFIG_PATH=$(basename $(dirname $0))
export PYTHONPATH="/workspace/ROLL:$PYTHONPATH"
python /workspace/ROLL/examples/start_agentic_rollout_pipeline.py --config_path agentic_demo --config_name agent_rollout_rock_swe_ack
Reproduce:
roll-registry-vpc.cn-hangzhou.cr.aliyuncs.com/roll/pytorch:nvcr-25.06-py3-torch280-vllm0102
git clone https://github.com/alibaba/ROLL.git; \
cd ROLL; \
cp /var/roll-config/* /workspace/ROLL/examples/; \
cp /var/roll-config/requirements_ack.txt /workspace/ROLL/; \
pip install -r requirements_ack.txt -i https://mirrors.aliyun.com/pypi/simple/; \
pip install rl-rock "protobuf<4.0.0" -i https://mirrors.aliyun.com/pypi/simple/; \
ulimit -n 65536 && ray start --head --block --dashboard-agent-listen-port=52365 --dashboard-host=0.0.0.0 --memory=274877906944 --metrics-export-port=8080 --num-cpus=64
bash /workspace/ROLL/examples/run_agentic_rollout_pipeline_rock_swe_ack.sh
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels