Modular L1-to-RL manipulation research stack for ENPM690.
This repository is not claiming full kitchen manipulation is solved. The current project result is:
Natural language / Qwen L1
-> structured IntentPacket
-> RL skill request
-> kinematic Approach -> Finisher policy stack
-> Gazebo/RViz demo path for visualization and controlled validation
The strongest validated motor-control result is still kinematic. Gazebo is used for live demo and integration validation, but full holder1 -> holder8 tray transport with contact dynamics is not solved yet.
- L1 semantic bridge: Qwen/MCP turns a natural-language command into a validated
IntentPacketand skill request. L1 does not output raw joint actions or trajectories. - Main RL skill path: the final kinematic skill stack is
Approach -> Finisher; Dock-Coarse and Bridge were useful diagnostics but are not the main controller. - Local kinematic workspace: Stage 0-4 success is
1.00; Stage 5 success is0.93, with about2.89 mmfinal position error and0.0208 radfinal orientation error. - Workspace expansion: later curricula extend the home-start shell beyond Stage 5. Stage 10/11 are larger stress shells, not the full reachable workspace.
- Random-start coverage: the mixed-start experiment reaches about
80.2%success in known-workspace random-start eval, while frontier/full-stress regions remain partial. - Route curriculum: the dense holder route improved from longest prefix
21to stable prefix120, with full-route probe longest prefix170. - Live demo tooling: scripts exist for L1 -> L2 -> L3 terminal output, RViz target markers, and Gazebo arm motion for recording.
- Full holder1 -> holder8 tray transport.
- Full continuous workspace coverage.
- Real camera-grounded perception.
- Contact/friction/object dynamics for tray carrying.
- Real robot deployment.
- Final project summary
- Official artifacts and numbers
- Final report PDF
- Final presentation
- Demo recording commands
- Current implementation notes
- Docs index
Run the L1 semantic bridge with the deterministic mock backend:
cd /home/jerry/.openclaw/workspace/repos/personal/RL_brain_trainer
source hrl_ws/.venv/bin/activate
export PYTHONPATH=/home/jerry/.openclaw/workspace/repos/personal/RL_brain_trainer/hrl_ws/src/hrl_trainer:$PYTHONPATH
python -m hrl_trainer.v5.qwen_l1_client \
--backend mock_qwen \
--command "Move tray1 from shelf_A1 to shelf_B1 while keeping it level and inserting with a stable pose." \
--output artifacts/v5/qwen_l1_demo/l1_to_rl_skill_request.jsonLaunch the original Gazebo kitchen scene directly:
cd /home/jerry/.openclaw/workspace/repos/personal/RL_brain_trainer
source /opt/ros/jazzy/setup.zsh
source external/ENPM662_Group4_FinalProject/install/setup.zsh
ros2 launch kitchen_robot_description gazebo.launch.py use_sim_time:=true headless:=falseRun the screen-recording demo after the scene is already open:
cd /home/jerry/.openclaw/workspace/repos/personal/RL_brain_trainer
bash scripts/final/run_live_gz_screen_recording_demo.sh local_skill --no-launch-sceneCheck the latest random-start workspace experiment:
cd /home/jerry/.openclaw/workspace/repos/personal/RL_brain_trainer
bash scripts/final/check_full_workspace_randomstart_status.sh workspace_full_coverage_randomstart_overnight_003Generated training runs, videos, ROS build products, and local demo logs are intentionally ignored. Keep code, configs, runbooks, final report sources, and selected small evidence artifacts in git; keep large checkpoints and runtime outputs local unless explicitly promoted.