Skip to content

[Bug] Agentic Training Failed with "rock: command not found" #412

@KunWuLuan

Description

@KunWuLuan

I am trying to use ROLL for RL training. I saw the following error message after model service is installed. It is strange because the model service has been installed and started.

Image

This is my config:

    defaults:
      - ../config/traj_envs@_here_
      - ../config/deepspeed_zero@_here_
      - ../config/deepspeed_zero2@_here_
      - ../config/deepspeed_zero3@_here_
      - ../config/deepspeed_zero3_cpuoffload@_here_
    
    hydra:
      run:
        dir: .
      output_subdir: null
    
    exp_name: "agentic_rollout_swe"
    seed: 42
    
    logging_dir: ./output/logs
    output_dir: ./output
    model_name: ${exp_name}-${now:%Y%m%d_%H%M%S}
    rollout_dump_dir: ./output/rollout_dump
    system_envs:
      USE_MODELSCOPE: '1'
    
    num_gpus_per_node: 8 # change this
    rpc_timeout: 72000
    
    max_steps: 10
    save_steps: 10
    logging_steps: 1
    eval_steps: 2
    resume_from_checkpoint: false
    
    rollout_batch_size: 1
    val_batch_size: 1
    sequence_length: 65536
    
    max_tokens_per_step: 4096
    
    advantage_clip: 0.2
    ppo_epochs: 1
    adv_estimator: "step_reinforce"
    batch_adjust_mode: "random_sample"
    step_reward_gamma: 1.0
    
    #pg_clip: 0.1
    #dual_clip_loss: True
    init_kl_coef: 0.0
    whiten_advantages: true
    entropy_loss_coef: 0
    max_grad_norm: 1.0
    
    
    pretrain: /var/model/Qwen2.5-7B-Instruct # change this
    reward_pretrain: /var/model/Qwen2.5-7B-Instruct # change this
    
    actor_train:
      model_args:
        flash_attn: fa2
        disable_gradient_checkpointing: false
        dtype: bf16
        model_type: ~
    actor_infer:
      model_args:
        flash_attn: fa2
        disable_gradient_checkpointing: true
        dtype: bf16
      generating_args:
        max_new_tokens: ${max_tokens_per_step} # single-turn response length
        top_p: 1.0
        top_k: 50
        num_beams: 1
        temperature: 1.0
        num_return_sequences: 1
        stop_strings: ["</tool_call>","</tool_call>\n","\n</tool_call>\n","\n</function>"]
        include_stop_str_in_output: true
      data_args:
        template: qwen3_coder
      strategy_args:
        strategy_name: vllm
        strategy_config:
          gpu_memory_utilization: 0.8
          block_size: 16
          load_format: auto
          tensor_parallel_size: 1
      device_mapping: list(range(1,2))
    
    reward_normalization:
      grouping: traj_group_id # tags(env_type)/traj_group_id(group)/batch(rollout_batch)... group_by reward/adv
      method: mean
      # norm_mean_type: batch
      # norm_std_type: group
    
    train_env_manager:
      max_env_num_per_worker: 1
      num_env_groups: 1
      # under the same group, the env config and env seed are ensured to be equal
      group_size: 1
      tags: [swebench_native_verified]
      num_groups_partition: [1] # If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation
      system_envs:
        # if you cannot get python env in rock due to connetion error, try to use this, may expire in the future
        ROCK_RTENV_PYTHON_V31114_INSTALL_CMD: '[ -f cpython31115.tar.gz ] && rm cpython31115.tar.gz; [ -d python ] && rm -rf python; wget -q -O cpython31115.tar.gz https://mirror.nju.edu.cn/github-release/astral-sh/python-build-standalone/20260303/cpython-3.11.15+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz && tar -xzf cpython31115.tar.gz && mv python runtime-env'
    val_env_manager:
      max_env_num_per_worker: 1
      num_env_groups: 1
      group_size: 1 # should be set to 1 because val temperature is set to 0 and same prompt leads to same output
      tags: [swebench_native_verified]
      num_groups_partition: [1] # TODO: If not set, all env names divide nums equally. Under the same group, the env config and env seed (prompt) are equal in each generation
      system_envs:
        # if you cannot get python env in rock due to connetion error, try to use this, may expire in the future
        ROCK_RTENV_PYTHON_V31114_INSTALL_CMD: '[ -f cpython31115.tar.gz ] && rm cpython31115.tar.gz; [ -d python ] && rm -rf python; wget -q -O cpython31115.tar.gz https://mirror.nju.edu.cn/github-release/astral-sh/python-build-standalone/20260303/cpython-3.11.15+20260303-x86_64-unknown-linux-gnu-install_only.tar.gz && tar -xzf cpython31115.tar.gz && mv python runtime-env'
    
    max_actions_per_traj: 60
    env_manager_cls: roll.pipeline.agentic.env_manager.agent_native_env_manager.AgentNativeStepEnvManager
    
    agent_config_common:
      agent_type: "default"
      
      # Startup command; placeholders (e.g., <<PROMPT>>) are parsed in the code
      run_cmd: 'iflow -p <<PROMPT>> --yolo'
      
      # Dependency pre-installation; modify based on your sandbox image
      pre_init_cmds:
        - command: "apt-get update"
          timeout_seconds: 600
        - command: "apt-get install -y curl git wget xz-utils"
          timeout_seconds: 600
        - command: "apt-get install -y build-essential libc6-dev patch procps npm"
          timeout_seconds: 600
        # Install helper tools like 'uv'
        - command: "wget -q https://xrl-sandbox-bucket.oss-cn-hangzhou.aliyuncs.com/uv-files/uv-x86_64-unknown-linux-gnu.tar.gz && tar -xzf uv-x86_64-unknown-linux-gnu.tar.gz --strip-components=1 -C /usr/local/bin && uv --version"
          timeout_seconds: 600 
    
      model_service_config: 
        type: "local"
        enabled: True
      
      # 运行时环境  
      runtime_env_config:
        type: node
        npm_registry: "https://registry.npmmirror.com"
        # Install specific iflow versions as needed
        custom_install_cmd: "wget --retry-connrefused --tries=10 --waitretry=2 -O ~/iflow-cli.tgz 'http://cloud.iflow.cn/iflow-cli/iflow-ai-iflow-cli-for-roll-0-4-4-v5.tgz' && npm i -g ~/iflow-cli.tgz"
      
      env:
        # Configure iflow parameters as needed
        IFLOW_apiKey: "test"
        IFLOW_baseUrl: "http://localhost:8080/v1"
        IFLOW_modelName: "ROME"
        IFLOW_searchApiKey: "88888888"
        IFLOW_selectedAuthType: "openai-compatible"
        IFLOW_disableAutoUpdate: "true"
        IFLOW_tokensLimit: "128000"
        IFLOW_shellTimeout: "360000"
        IFLOW_coreTools: "Edit,exit_plan_mode,glob,list_directory,multi_edit,plan,read plan,read_file,read_many_files,save_memory,Search,Shell,task,web_fetch,web_search,write_file,xml_escape"

    custom_envs:
      swebench_native_verified:
        env_type: "rock_tb_native_env"
        max_steps: ${max_actions_per_traj}
        max_tokens_per_step: ${max_tokens_per_step}
        env_manager_cls: ${env_manager_cls}
        agent_system_template: "agent_system_template placeholder"
        agent_template: "agent_template placeholder"
        env_config:
          dataset_name: /workspace/ROLL/data/swe_bench_verified_example.jsonl # change to your owen data path
          tools: ~
          max_steps: ${max_actions_per_traj}
          mode: "val"
          sandbox_base_url: http://rock:8080 # change to your own service address if needed
          user_id: "xxx"
          experiment_id: "test_tb_native"
          test_files: ["/var/model-dataset/terminal-bench-datasets/datasets/swebench-verified/"]
          agent_config: ${agent_config_common}

This is the script:

    #!/bin/bash
    set +x
    
    CONFIG_PATH=$(basename $(dirname $0))
    export PYTHONPATH="/workspace/ROLL:$PYTHONPATH"
    python /workspace/ROLL/examples/start_agentic_rollout_pipeline.py --config_path agentic_demo  --config_name agent_rollout_rock_swe_ack

Reproduce:

roll-registry-vpc.cn-hangzhou.cr.aliyuncs.com/roll/pytorch:nvcr-25.06-py3-torch280-vllm0102

            git clone https://github.com/alibaba/ROLL.git; \
            cd ROLL; \
            cp /var/roll-config/* /workspace/ROLL/examples/; \
            cp /var/roll-config/requirements_ack.txt /workspace/ROLL/; \
            pip install -r requirements_ack.txt -i https://mirrors.aliyun.com/pypi/simple/; \
            pip install rl-rock "protobuf<4.0.0" -i https://mirrors.aliyun.com/pypi/simple/; \
            ulimit -n 65536 && ray start --head  --block  --dashboard-agent-listen-port=52365  --dashboard-host=0.0.0.0  --memory=274877906944  --metrics-export-port=8080  --num-cpus=64
            bash /workspace/ROLL/examples/run_agentic_rollout_pipeline_rock_swe_ack.sh

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions