feat(repl): add repl by paul-nechifor · Pull Request #1665 · dimensionalOS/dimos

paul-nechifor · 2026-03-25T07:28:37Z

Problem

We don't have a way to inspect a running system.
Agents don't have enough access with just CLI commands

Closes DIM-743

Solution

Used rpyc to connect to all running modules and proxy objects.
We can now use dimos repl to start an IPython REPL and call methods on modules or get any property

Breaking Changes

None

How to Test

uv run dimos run unitree-go2-agentic

and then:

●  uv run dimos repl
DimOS REPL
Connected to localhost:18861

  coordinator  ModuleCoordinator instance
  modules()    List deployed module names
  get(name)    Get module instance by class name


In [1]: get('WavefrontFrontierExplorer')
Out[1]: <dimos.navigation.frontier_exploration.wavefront_frontier_goal_selector.WavefrontFrontierExplorer object at 0x77dda4604680>

In [2]: _.begin_exploration()
Out[2]: 'Started exploration skill. The robot is now moving. Use end_exploration to stop. You also need to cancel before starting a new movement tool.'

In [3]:

Contributor License Agreement

I have read and approved the CLA.

greptile-apps · 2026-03-25T07:34:25Z

Greptile Summary

This PR adds an interactive RPyC-based REPL (dimos repl) for inspecting live running DimOS systems. When dimos run starts, a coordinator-level RPyC server is launched on a fixed port (default 18861) and each worker process gets its own auto-assigned RPyC server. dimos repl then connects to the coordinator, resolves module locations, and proxies objects directly from worker processes into an IPython (or stdlib code) session. The approach is well-structured, fully tested, and fits naturally into the existing worker/coordinator architecture.

Issues found:

Daemon PID regression (P1): RunEntry is now created before daemonize() is called, so pid=os.getpid() captures the parent process PID rather than the daemon's PID. After the double-fork in daemonize(), the daemon process has a different PID. entry.save() is called inside the daemon with this stale PID, causing dimos stop and dimos status to target an already-exited process and never successfully signal the daemon. The fix is to reassign entry.pid = os.getpid() immediately after daemonize(log_dir).
No version bound for rpyc (P2): The dependency is declared without a version specifier in pyproject.toml, leaving the project open to future breaking changes from rpyc major releases.
find_free_port TOCTOU (P2): The shared fixture closes the socket before returning the port, creating a small window where the port could be claimed by another process in parallel CI.

Confidence Score: 3/5

Not safe to merge until the daemon PID regression is fixed — dimos stop will silently fail for all daemonised instances.
The REPL feature itself (repl_server.py, repl.py, worker IPC, module_coordinator additions) is well-implemented and thoroughly tested. However, refactoring the RunEntry creation to a single pre-fork location introduced a regression: in daemon mode the stored PID is the parent's PID, not the daemon's. This breaks dimos stop and dimos status — core production CLI commands — in the primary daemon usage pattern. That meets the threshold for a 3/5 per the guidance (likely production reliability problem in normal usage).
dimos/robot/cli/dimos.py — the pid=os.getpid() call must move to after daemonize().

Important Files Changed

Filename	Overview
dimos/robot/cli/dimos.py	Adds `--repl/--no-repl` and `--repl-port` options to `dimos run`, and a new `repl` subcommand. Introduces a regression: `RunEntry.pid` is captured before `daemonize()`, so the stored PID is the parent's (already-exited) PID in daemon mode, breaking `dimos stop` / `dimos status`.
dimos/core/repl_server.py	New file implementing `ReplServer` (coordinator-side RPyC server) and `start_worker_repl_server` (per-worker RPyC server). Both use `ThreadedServer` with `allow_all_attrs/setattr/delattr` — intentional for a debug REPL. Port 0 auto-assignment is handled correctly: rpyc binds the socket in `__init__`, so `server.port` is reliable before the thread starts.
dimos/robot/cli/repl.py	New REPL client: auto-detects port from run registry, connects via rpyc, and starts either an IPython or stdlib `code.interact` session with `coordinator`, `modules()`, and `get(name)` pre-populated. Connection cleanup in `finally` is correct; `sync_request_timeout=None` is appropriate for an interactive REPL.
dimos/core/module_coordinator.py	Adds `list_modules`, `get_module`, `get_module_location`, and `start_repl_server` methods, plus `_module_locations` tracking. Introduces a guarded `client` property to replace duplicated "not started" checks. `stop()` correctly tears down the REPL server before other resources.
dimos/core/worker.py	Adds `Worker.start_repl_server()` (sends IPC message to worker process) and handles the `start_repl_server` message type in `_worker_loop`. Both are wrapped in the existing `try/except Exception` error-handling block, so failures surface cleanly as error responses rather than crashing the loop.
dimos/conftest.py	Adds shared test fixtures (`find_free_port`, `wait_until_rpyc_connectable`, `make_stub_coordinator`) and the `_StubCoordinator` helper used across the new REPL tests. Minor TOCTOU in `find_free_port` (socket closed before caller binds), which is standard but could cause rare flakiness in parallel CI.
pyproject.toml	Adds `rpyc` as a runtime dependency and suppresses mypy errors for `rpyc`/`rpyc.*`. No version bound is specified, leaving the project open to future breaking changes from rpyc major releases.

Sequence Diagram

sequenceDiagram
    participant User
    participant CLI as dimos CLI
    participant Coord as ModuleCoordinator
    participant ReplSrv as ReplServer (port 18861)
    participant Worker as Worker Process
    participant WorkerSrv as WorkerReplServer (port 0→N)

    Note over CLI,WorkerSrv: dimos run (startup)
    CLI->>Coord: build() + start()
    CLI->>Coord: start_repl_server(port=18861)
    Coord->>Worker: start_repl_server IPC message
    Worker->>WorkerSrv: start_worker_repl_server(instances)
    WorkerSrv-->>Worker: listening on port N
    Worker-->>Coord: port N
    Coord->>Coord: _module_locations["ModuleX"] = ("localhost", N)
    Coord->>ReplSrv: ReplServer(coordinator).start()

    Note over User,WorkerSrv: dimos repl (client session)
    User->>CLI: dimos repl
    CLI->>ReplSrv: rpyc.connect(localhost, 18861)
    CLI->>ReplSrv: root.get_coordinator()
    ReplSrv-->>CLI: coordinator proxy
    CLI->>User: IPython REPL (coordinator, modules(), get())

    User->>CLI: get("ModuleX")
    CLI->>ReplSrv: root.get_module_location("ModuleX")
    ReplSrv-->>CLI: ("localhost", N)
    CLI->>WorkerSrv: rpyc.connect(localhost, N)
    CLI->>WorkerSrv: root.get_instance_by_name("ModuleX")
    WorkerSrv-->>CLI: ModuleX proxy (allow_all_attrs)
    CLI-->>User: <ModuleX object>

Comments Outside Diff (2)

pyproject.toml, line 1068 (link)

No version constraint for rpyc

rpyc is added without a version specifier, which means any future major release (e.g., 7.x) could be resolved and potentially introduce breaking changes to the ThreadedServer API or Connection behaviour used in repl_server.py.
dimos/conftest.py, line 27-31 (link)

TOCTOU race in find_free_port

The socket is closed before the port number is returned to the caller. Between closing the socket and the test actually binding to that port there is a small window in which another OS process (or a parallel pytest worker) could claim the same ephemeral port, leading to an Address already in use error and a flaky test.

The same pattern was previously present in test_unity_sim.py (which this PR correctly consolidates here). The standard mitigation is to keep the socket open and pass it directly to the server, or to use SO_REUSEADDR — but that is a broader test-infrastructure concern rather than a blocker for this PR.

_{Reviews (1): Last reviewed commit: "feat(repl): add repl" | Re-trigger Greptile}

dimos/robot/cli/dimos.py

leshy · 2026-03-25T07:39:25Z

I understand this is done for interacting with running dimos, I assume ideally we can use the same API if deploying a blueprint and playing with it as a part of an actual python file right?

in py:

bla = dimos.deploy(something)
bla.nav.goto(..)

vs (in terminal)

uv run dimos --daemon run ...
uv run dimos repl

then

coordinator.nav.goto(..)

would be nice to have parity here

or if I have a running dimos I (or agent!) should be able to write

test.py

bla = dimos.connect()
bla.nav.goto(..)

right? (ignore actual API I'm using, it's an example imaginary API)

would be great to have docs/ showing all 3 usecases if ready,
if not - do you agree with my thoughts, do want to do this in this PR or separate?

paul-nechifor · 2026-03-25T23:21:58Z

I understand this is done for interacting with running dimos, I assume ideally we can use the same API if deploying a blueprint and playing with it as a part of an actual python file right?

in py:
bla = dimos.deploy(something)
bla.nav.goto(..)
vs (in terminal)
uv run dimos --daemon run ...
uv run dimos repl
then
coordinator.nav.goto(..)
would be nice to have parity here

or if I have a running dimos I (or agent!) should be able to write

test.py
bla = dimos.connect()
bla.nav.goto(..)
right? (ignore actual API I'm using, it's an example imaginary API)

would be great to have docs/ showing all 3 usecases if ready, if not - do you agree with my thoughts, do want to do this in this PR or separate?

But this has always been possible it's just hard to specify a system composition in a REPL. Also, we only give access to call methods marked with @rpc. rpyc gives you access to any methods/properties on the object.

Example blueprint start in REPL:

>>> from dimos.robot.unitree.go2.connection import GO2Connection
>>> b = GO2Connection.blueprint()
>>> mc = b.build()
23:14:51.426[inf][dimos/core/blueprints.py      ] Building the blueprint
23:14:51.431[inf][dimos/core/blueprints.py      ] Starting the modules
23:14:51.454[inf][dimos/core/worker_manager.py  ] Worker pool started. n_workers=2
23:15:02.099[inf][dimos/core/worker.py          ] Deployed module. module=GO2Connection module_id=0 worker_id=0
23:15:02.101[inf][dimos/core/blueprints.py      ] Transport module=GO2Connection name=pointcloud original_name=pointcloud topic=/pointcloud#sensor_msgs.PointCloud2 transport=LCMTransport type=dimos.msgs.sensor_msgs.PointCloud2.PointCloud2
23:15:02.101[inf][dimos/core/blueprints.py      ] Transport module=GO2Connection name=color_image original_name=color_image topic=/color_image#sensor_msgs.Image transport=LCMTransport type=dimos.msgs.sensor_msgs.Image.Image
23:15:02.101[inf][dimos/core/blueprints.py      ] Transport module=GO2Connection name=camera_info original_name=camera_info topic=/camera_info#sensor_msgs.CameraInfo transport=LCMTransport type=dimos.msgs.sensor_msgs.CameraInfo.CameraInfo
23:15:02.102[inf][dimos/core/blueprints.py      ] Transport module=GO2Connection name=cmd_vel original_name=cmd_vel topic=/cmd_vel#geometry_msgs.Twist transport=LCMTransport type=dimos.msgs.geometry_msgs.Twist.Twist
23:15:02.102[inf][dimos/core/blueprints.py      ] Transport module=GO2Connection name=odom original_name=odom topic=/odom#geometry_msgs.PoseStamped transport=LCMTransport type=dimos.msgs.geometry_msgs.PoseStamped.PoseStamped
23:15:02.102[inf][dimos/core/blueprints.py      ] Transport module=GO2Connection name=lidar original_name=lidar topic=/lidar#sensor_msgs.PointCloud2 transport=LCMTransport type=dimos.msgs.sensor_msgs.PointCloud2.PointCloud2
*************** EP Error ***************
EP Error /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:539 void onnxruntime::python::RegisterTensorRTPluginsAsCustomOps(PySessionOptions&, const onnxruntime::ProviderOptions&) Please install TensorRT libraries as mentioned in the GPU requirements page, make sure they're in the PATH or LD_LIBRARY_PATH, and that your GPU is supported.
 when using ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
****************************************
23:15:14.060[inf][os/simulation/mujoco/policy.py] Loaded policy: /home/p/pro/dimensional/dimos/data/mujoco_sim/unitree_go1_policy.onnx with providers: ['CUDAExecutionProvider', 'CPUExecutionProvider']
23:15:14.160[inf][t/unitree/mujoco_connection.py] MuJoCo process started successfully
>>> go2 = mc.get_instance(GO2Connection)
>>> go2.
go2.actor_class(       go2.actor_instance     go2.remote_name        go2.rpc                go2.rpcs               go2.stop_rpc_client()  
>>> go2.liedown()
True
>>>

leshy

some API change requests for easier access, idk if you want to merge first then iterate, seems isolated enough to iterate here?

leshy · 2026-03-26T04:03:14Z

docs/usage/repl.md

+['GO2Connection', 'RerunBridge', 'McpServer', ...]
+
+# Get a module instance and call methods on it
+>>> wfe = get('WavefrontFrontierExplorer')


this feels slightly awkward, why can't I just have WavefrontFrontierExplorer already there in the namespace?

leshy · 2026-03-26T04:03:34Z

docs/usage/repl.md

+```python
+# List all deployed modules
+>>> modules()
+['GO2Connection', 'RerunBridge', 'McpServer', ...]


isn't it nicer to just have a list of actual instances?

leshy · 2026-03-26T04:04:37Z

docs/usage/repl.md

+"Started exploring."
+
+# Access the coordinator directly
+>>> coordinator.list_modules()


isn't it nicer to just have a list of actual instances?

or even better if I have a magic namespace I can do coordinator.modules.WavefrontFrontierExplorer

you have repl so coordinator.modules. TAB gives you autocomplete etc

(memory2 does this for streams)

leshy · 2026-03-26T04:10:29Z

But this has always been possible it's just hard to specify a system composition in a REPL. Also, we only give access to call methods marked with @rpc. rpyc gives you access to any methods/properties on the object.

yes I know, just making sure that

REPL API,
and "I write a small script to interact with live dimos" API
and "I write a small script that deploys a blueprint then interacts with it" API
and "I run Ipython directly and deploy a blueprint then interact with it" API

is all the same API, you just change how you interact with a deployed thing, and this is what I'd document, not REPL as a single entrypoint

leshy · 2026-03-26T04:55:10Z

after chatting, update here, example of unified interaction with dimos - it doesn't have to be this exact same API, point is it's the same API in all cases

I use repl to talk to running dimos
dimos repl

> dimos.modules. <tab>
Go2Connection, VoxelMapper, WaveFrontExplorer, ...

> dimos.modules.WaveFrontExplorer.
<tab>
Start, Stop, start_exploration, some_skill...

> dimos.modules.WaveFrontExplorer.start_exploration()
...

> dimos.modules.WaveFrontExplorer.some_skill(`some_skill_arg`)
...

I run dimos (dimos run unitre...) then write a script
Script talks to dimos
bla.py

from dimos.core import connect

dimos = connect() #idk exact naming
dimos.modules.WaveFrontExplorer.start_exploration()
dimos.modules.WaveFrontExplorer.some_skill(`some_skill_arg`)

I run my blueprint by myself
bla.py (this can be a script but could also be an ipython interaction)

from dimos.robot.unitree.go2.connection import GO2Connection
dimos = GO2Connection.blueprint().build()

dimos.modules.WaveFrontExplorer.start_exploration()
dimos.modules.WaveFrontExplorer.some_skill(`some_skill_arg`)

above assumes we cannot deploy multiple blueprints in the same instance, I think we want this, so will iterate on that separately

greptile-apps bot reviewed Mar 25, 2026

View reviewed changes

dimos/robot/cli/dimos.py Show resolved Hide resolved

paul-nechifor force-pushed the paul/feat/repl branch from f33c595 to 4537797 Compare March 25, 2026 07:39

feat(repl): add repl

0ba5220

paul-nechifor force-pushed the paul/feat/repl branch from 4537797 to 0ba5220 Compare March 26, 2026 00:00

leshy requested changes Mar 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(repl): add repl#1665

feat(repl): add repl#1665
paul-nechifor wants to merge 1 commit intodevfrom
paul/feat/repl

paul-nechifor commented Mar 25, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 25, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

Uh oh!

leshy commented Mar 25, 2026 •

edited

Loading

Uh oh!

paul-nechifor commented Mar 25, 2026

Uh oh!

leshy left a comment

Uh oh!

leshy Mar 26, 2026

Uh oh!

leshy Mar 26, 2026

Uh oh!

leshy Mar 26, 2026

Uh oh!

leshy commented Mar 26, 2026 •

edited

Loading

Uh oh!

leshy commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paul-nechifor commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Breaking Changes

How to Test

Contributor License Agreement

Uh oh!

greptile-apps bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

Uh oh!

leshy commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paul-nechifor commented Mar 25, 2026

Uh oh!

leshy left a comment

Choose a reason for hiding this comment

Uh oh!

leshy Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

leshy Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

leshy Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

leshy commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leshy commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paul-nechifor commented Mar 25, 2026 •

edited

Loading

greptile-apps bot commented Mar 25, 2026 •

edited

Loading

leshy commented Mar 25, 2026 •

edited

Loading

leshy commented Mar 26, 2026 •

edited

Loading

leshy commented Mar 26, 2026 •

edited

Loading