feat: Ardupilot support for Gazebo (+ hardware) drone simulation with video stream and in a warehouse environment by snktshrma · Pull Request #1576 · dimensionalOS/dimos

snktshrma · 2026-03-16T14:52:17Z

Gazebo + ArduPilot SITL: Gazebo video stream (RTP from UDP 5600), configurable as forward-facing camera source in connection module. New blueprints: basic, basic-with-spatial, agentic; all registered.
Position & motion: MAVLink position target in local NED with velocity feedforward (type mask fixed).
New agent skills: position target, move-by-distance, and yaw (rotate to heading).
Tracking skill in gazebo
Added spatial memory support and navigate using spatial memory skill
Sim: MAVLink uses local position NED when present.
Docs: README section for Gazebo + ArduPilot (SITL, plugin, gz sim, sim_vehicle.py, DimOS).

To-Do:

Add ardupiliot-gazebo support with dimos
Add agentic control and object tracking/follow
Add spatial memory support and navigate by query to spatial database
Add Navigation stack support / separate avoidance and planning algorithm
Do hardware test

uchibeke · 2026-03-16T17:31:55Z

This is a genuinely exciting project — natural language control for humanoids, quadrupeds, drones, and robotic arms is the kind of thing that makes the "agentic AI" category concrete in a way that purely software agents don't.

One thing that jumps out immediately from the architecture: physical hardware agents need pre-action authorization at a fundamentally higher assurance level than software agents. A hallucinated Jira write is recoverable. A hallucinated command to a robotic arm or drone is not.

The standard approach to this problem in software agents — prompt-layer instructions like "always ask before acting" — doesn't hold under adversarial conditions. Prompt injection can instruct an agent to skip confirmation steps. For physical hardware, that failure mode is unacceptable.

The pattern that works: before_tool_call hook enforcement

APort Agent Guardrails implements pre-action authorization at the platform hook level, not the prompt level. Every tool call is intercepted and evaluated against a YAML policy before it executes. The model cannot skip it — there's no prompt or agent response that bypasses the hook.

For physical agents in dimos, this maps to: capability scope enforcement before actuator commands reach hardware. You'd define a policy manifest for each robot/drone's authorized capabilities, and any tool call outside that scope is denied at the framework level before it propagates to the hardware adapter.

The underlying spec — the Open Agent Protocol (OAP), DOI: 10.5281/zenodo.18901596 — also defines agent passports: signed capability manifests that declare what an agent is authorized to do. In a multi-agent dimos workflow (say, a planner agent delegating to a hardware execution agent), passports give you chain-of-custody verification that the executing agent is actually scoped for physical actuation.

"When your agent commands a robotic arm, you need verified capability scope, not a prompt."

A few specifics for dimos:

No sidecar or infrastructure required — YAML policy + hook, zero overhead
Works at the framework level — integrates before commands reach hardware adapters
Public CTF results: 1,149 adversarial social-engineering attempts against the guardrail, 0 bypasses
Apache 2.0, open standard

The physical-hardware angle makes the authorization problem more urgent, not just more interesting. Happy to discuss how OAP might fit into the dimos architecture — whether as a framework-level gate before hardware commands, or as a passport layer for multi-agent physical workflows.

Repo: https://github.com/aporthq/aport-agent-guardrails
Spec: https://github.com/aporthq/aport-spec
DOI: https://doi.org/10.5281/zenodo.18901596

… tracking, spatial model and warehouse environment

…ions

greptile-apps · 2026-03-26T19:59:42Z

Greptile Summary

This PR adds full Gazebo + ArduPilot SITL simulation support to the DimOS drone stack: a new GazeboVideoStream module decodes H264/RTP from UDP 5600 via GStreamer; three new blueprint variants (drone-basic-gazebo, drone-agentic-gazebo, drone-agentic-gazebo-spatial) wire this into the existing pipeline; and new agent skills (move_by_distance, go_to_position, rotate_to, navigate_to_where_i_saw) extend local NED navigation. The MAVLink layer now prefers LOCAL_POSITION_NED for indoor odometry, rotate_to switches from GPS heading to ATTITUDE.yaw, and the visual servoing controller is refactored from PID-based strafe to proportional forward-speed + yaw-rate + vz for a forward camera.\n\nKey changes and issues found:\n- P1 – follow_object was changed from a streaming generator to a fire-and-forget string return; the duration parameter no longer stops tracking after that many seconds — the drone tracks indefinitely regardless of what is passed.\n- P1 – The docker dependency extra replaced opencv-python-headless with opencv-contrib-python==4.10.0.84, a full GUI-linked build. Headless Docker containers may require additional system libraries that were not needed before.\n- P2 – DEFAULT_VERTICAL_ERROR_GAIN is 0.0012 in the class body but the __init__ docstring says "Default 0.0008".\n- P2 – x_pid, y_pid, z_pid are instantiated in DroneVisualServoingController.__init__ but the new compute_velocity_control never reads their output, creating dead code.

Confidence Score: 3/5

Safe for Gazebo simulation use; the follow_object duration contract and Docker headless dependency issues should be addressed before wider deployment.

The Gazebo SITL path is well-structured and the new skills work correctly in isolation. However, two P1 issues reduce confidence: the follow_object skill silently ignores the duration bound (a regression from the generator implementation), and the Docker dependency swap from headless to GUI-linked OpenCV may break CI/Docker deployments.

dimos/robot/drone/connection_module.py (follow_object duration regression), pyproject.toml (Docker opencv headless → contrib swap)

Important Files Changed

Filename	Overview
dimos/robot/drone/connection_module.py	Adds move_by_distance, go_to_position, rotate_to skills and integrates GazeboVideoStream. The follow_object skill was changed from a generator to a fire-and-forget string, which silently breaks the duration tracking contract.
dimos/robot/drone/mavlink_connection.py	Adds move_by_distance_body_m, set_position_target, and rotate_to using ATTITUDE.yaw for heading; improves _publish_odom to prefer LOCAL_POSITION_NED over GPS in indoor mode; move_twist now includes yaw_rate support with correct type_mask.
dimos/robot/drone/drone_visual_servoing_controller.py	Refactored from PID-based strafe control to proportional gain forward-speed + yaw-rate + vz control for forward camera. PID controllers are still instantiated but never used (dead code). DEFAULT_VERTICAL_ERROR_GAIN = 0.0012 but docstring says Default 0.0008.
dimos/robot/drone/gazebo_video_stream.py	New file: GStreamer-based H264/RTP UDP decoder for Gazebo camera. _error_monitor thread may briefly block on readline() after stop. Overall structure is sound.
dimos/robot/drone/drone_spatial_nav_skill.py	New skill: queries SpatialMemory CLIP index by text, converts stored world pose to local NED, and sends a position target. Coordinate transform logic is consistent with mavlink_connection internal axes.
dimos/robot/drone/drone_tracking_module.py	Tracking now sends yaw_rate + vz instead of strafe vy; forward velocity zeroed after FORWARD_VELOCITY_DURATION seconds; **kwargs forwarded to super(). Logic is consistent with updated servoing controller.
pyproject.toml	Replaces opencv-python + opencv-python-headless with opencv-contrib-python==4.10.0.84 everywhere including Docker. The Docker extra now links against GUI libraries where opencv-python-headless was previously used.
dimos/robot/drone/blueprints/agentic/drone_agentic.py	Adds drone_agentic_gazebo and drone_agentic_gazebo_spatial blueprints; system prompt updated with clearer skill descriptions. Straightforward composition.
dimos/robot/drone/blueprints/basic/drone_basic.py	Adds drone_basic_gazebo and drone_basic_gazebo_spatial blueprints composing DroneConnectionModule with video_source=gazebo and optional SpatialMemory. Clean composition.
dimos/robot/drone/specs.py	New Protocol/Spec definitions for SpatialMemoryQuerySpec and DroneGoToPositionSpec. Clean structural typing with no issues.
dimos/robot/drone/test_drone.py	New unit tests for move_by_distance delegation, NED fallback, and updated visual servoing assertions. Tests correctly reflect the new API contract.

Sequence Diagram

sequenceDiagram
    participant GZ as Gazebo SITL
    participant GVS as GazeboVideoStream
    participant GStreamer as gst-launch-1.0
    participant DCM as DroneConnectionModule
    participant MLC as MavlinkConnection
    participant DTM as DroneTrackingModule
    participant DSC as DroneVisualServoingController
    participant DSNS as DroneSpatialNavSkill
    participant SM as SpatialMemory

    GZ->>GStreamer: H264/RTP UDP:5600
    GStreamer->>GVS: raw RGB frames (stdout)
    GVS->>DCM: video Subject[Image]

    GZ->>MLC: MAVLink UDP:14550 (LOCAL_POSITION_NED, ATTITUDE)
    MLC->>DCM: telemetry + odometry

    DCM->>DTM: video frames
    DTM->>DSC: compute_velocity_control(target_x, target_y)
    DSC-->>DTM: (vx, vy=0, vz, yaw_rate)
    DTM->>DCM: Twist cmd_vel (move_twist)
    DCM->>MLC: SET_POSITION_TARGET_LOCAL_NED (body NED, vx+yaw_rate)
    MLC->>GZ: MAVLink velocity command

    Note over DSNS,SM: navigate_to_where_i_saw flow
    DSNS->>SM: query_by_text(description)
    SM-->>DSNS: [{metadata: {pos_x, pos_y, pos_z}}]
    DSNS->>DCM: go_to_position(ned_x, ned_y, ned_z)
    DCM->>MLC: set_position_target(x, y, z)
    MLC->>GZ: SET_POSITION_TARGET_LOCAL_NED (MAV_FRAME_LOCAL_NED)

Comments Outside Diff (2)

dimos/robot/drone/connection_module.py, line 425-486 (link)

follow_object silently ignores the duration tracking contract

The old implementation was a generator that would yield status updates and keep monitoring for the full duration seconds, then stop the drone when it expired. The new implementation:
1. Sends the command
2. Polls for at most min(duration, 30.0) seconds for an initial confirmation
3. Returns a string — the drone then tracks indefinitely (or until stop_follow) regardless of the duration argument
This means follow_object("person", duration=10) does not stop tracking after 10 seconds — it just caps the initial confirmation window at 10 s. An agent calling this with an explicit duration to bound tracking time will be silently surprised when tracking never stops on its own.

The DroneTrackingModule.track_object method still respects duration <= 0 → run forever, so the agent has no way to pass a meaningful duration through the new skill API. Consider one of:
- Sending a stop command after duration has elapsed (in a background thread)
- Documenting clearly that duration only controls the initial status-wait window, not the tracking lifetime
- Rename the parameter to status_timeout to avoid the confusion
dimos/robot/drone/gazebo_video_stream.py, line 1007-1015 (link)

_error_monitor may hang briefly on readline() after stop()

The loop guard checks not self._stop_event.is_set() before each iteration, but the actual blocking call self._process.stderr.readline() runs inside the loop after the guard has already passed. When stop() terminates the subprocess, the stderr pipe will eventually close and readline() returns b"", but there can be a delay if the OS buffers the pipe close.

Because the thread is daemon=True this won't prevent process exit, but it can delay clean teardown during testing. A common pattern is to readline() first and break on empty bytes:
```
def _error_monitor(self) -> None:
    while self._process and self._process.stderr:
        line = self._process.stderr.readline()
        if not line:
            break
        if self._stop_event.is_set():
            break
        s = line.decode("utf-8", errors="replace").strip()
        if "ERROR" in s or "WARNING" in s:
            logger.warning("GStreamer: %s", s)
```

_{Reviews (1): Last reviewed commit: "feat: Added spatial memory and query to ..." | Re-trigger Greptile}

greptile-apps · 2026-03-26T19:59:45Z

dimos/robot/drone/drone_visual_servoing_controller.py

+    DEFAULT_VERTICAL_ERROR_GAIN = 0.0012
+    MAX_VZ = 0.45  # m/s
+    # Gain: lateral pixel error -> yaw rate (rad/s). Object right of center -> positive yaw_rate (turn right).
+    DEFAULT_LATERAL_ERROR_TO_YAW_RATE = 0.001
+    MAX_YAW_RATE = 0.5  # rad/s

    def __init__(
        self,
        x_pid_params: PIDParams,
        y_pid_params: PIDParams,
        z_pid_params: PIDParams | None = None,
+        forward_camera: bool = True,
+        forward_speed: float | None = None,
+        vertical_error_gain: float | None = None,
+        lateral_error_to_yaw_rate: float | None = None,
    ) -> None:
        """
        Initialize drone visual servoing controller.

        Args:
-            x_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for forward/back
-            y_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for left/right
-            z_pid_params: Optional params for altitude control
+            x_pid_params: Reserved for forward from image later.
+            y_pid_params: Reserved (lateral error drives yaw rate, not strafe).
+            z_pid_params: Optional; unused when using vertical_error_gain.
+            forward_camera: Reserved for later.
+            forward_speed: Constant vx (m/s). Default 0.2.
+            vertical_error_gain: Image vertical error (px) -> vz. Default 0.0008.
+            lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001.


Default gain docstring mismatch

DEFAULT_VERTICAL_ERROR_GAIN is 0.0012 at the class level, but the __init__ docstring says the default is 0.0008. Any user who reads the docstring to understand the gain they're inheriting will configure their system based on the wrong value.

Suggested change

DEFAULT_VERTICAL_ERROR_GAIN = 0.0012

MAX_VZ = 0.45 # m/s

# Gain: lateral pixel error -> yaw rate (rad/s). Object right of center -> positive yaw_rate (turn right).

DEFAULT_LATERAL_ERROR_TO_YAW_RATE = 0.001

MAX_YAW_RATE = 0.5 # rad/s

def __init__(

self,

x_pid_params: PIDParams,

y_pid_params: PIDParams,

z_pid_params: PIDParams | None = None,

forward_camera: bool = True,

forward_speed: float | None = None,

vertical_error_gain: float | None = None,

lateral_error_to_yaw_rate: float | None = None,

) -> None:

"""

Initialize drone visual servoing controller.

Args:

x_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for forward/back

y_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for left/right

z_pid_params: Optional params for altitude control

x_pid_params: Reserved for forward from image later.

y_pid_params: Reserved (lateral error drives yaw rate, not strafe).

z_pid_params: Optional; unused when using vertical_error_gain.

forward_camera: Reserved for later.

forward_speed: Constant vx (m/s). Default 0.2.

vertical_error_gain: Image vertical error (px) -> vz. Default 0.0008.

lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001.

forward_speed: Constant vx (m/s). Default 0.2.

vertical_error_gain: Image vertical error (px) -> vz. Default 0.0012.

lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001.

greptile-apps · 2026-03-26T19:59:50Z

pyproject.toml

@@ -322,7 +322,7 @@ docker = [
    "pydantic-settings>=2.11.0,<3",


opencv-python-headless replaced by GUI-linked opencv-contrib-python in Docker

The docker extra previously declared opencv-python-headless, which is a headless build specifically designed for server / Docker environments (no libGL, no libGTK, no display-related .so requirements). This PR replaces it with opencv-contrib-python==4.10.0.84, which is a full GUI build.

Consequences:

Headless Docker images without libGL.so.1 / GTK may get import-time warnings or ImportError for display-related modules.

opencv-contrib-python==4.10.0.84 is two major releases behind the opencv-python==4.13.0.92 still present in the lockfile, creating a version skew in the dependency graph that can produce subtle runtime behaviour differences.

If opencv-contrib modules (e.g. cv2.TrackerCSRT_create) are needed, consider keeping the headless variant or using a separate optional extra for GUI support:

Suggested change

"pydantic-settings>=2.11.0,<3",

"opencv-contrib-python-headless==4.10.0.84",

greptile-apps · 2026-03-26T19:59:51Z

dimos/robot/drone/drone_visual_servoing_controller.py

    def __init__(
        self,
        x_pid_params: PIDParams,
        y_pid_params: PIDParams,
        z_pid_params: PIDParams | None = None,
+        forward_camera: bool = True,
+        forward_speed: float | None = None,
+        vertical_error_gain: float | None = None,
+        lateral_error_to_yaw_rate: float | None = None,
    ) -> None:
        """
        Initialize drone visual servoing controller.

        Args:
-            x_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for forward/back
-            y_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for left/right
-            z_pid_params: Optional params for altitude control
+            x_pid_params: Reserved for forward from image later.
+            y_pid_params: Reserved (lateral error drives yaw rate, not strafe).
+            z_pid_params: Optional; unused when using vertical_error_gain.
+            forward_camera: Reserved for later.
+            forward_speed: Constant vx (m/s). Default 0.2.
+            vertical_error_gain: Image vertical error (px) -> vz. Default 0.0008.
+            lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001.
        """
        self.x_pid = PIDController(*x_pid_params)
        self.y_pid = PIDController(*y_pid_params)
        self.z_pid = PIDController(*z_pid_params) if z_pid_params else None
+        self.forward_camera = forward_camera
+        self.forward_speed = forward_speed if forward_speed is not None else self.DEFAULT_FORWARD_SPEED
+        self.vertical_error_gain = (
+            vertical_error_gain if vertical_error_gain is not None else self.DEFAULT_VERTICAL_ERROR_GAIN
+        )
+        self.lateral_error_to_yaw_rate = (
+            lateral_error_to_yaw_rate
+            if lateral_error_to_yaw_rate is not None
+            else self.DEFAULT_LATERAL_ERROR_TO_YAW_RATE
+        )


PID controllers initialized but never used in compute_velocity_control

x_pid, y_pid, and z_pid are still instantiated but compute_velocity_control now uses only proportional gains — the PID state is never read during a tracking session. The docstring says the params are "reserved for future use", but dead code that is visibly constructed and reset (see reset()) can mislead maintainers into thinking tuning x_pid_params/y_pid_params affects tracking behaviour.

Consider either:

Removing the PID instantiation and reset() entries until the feature is actually implemented, or

Adding a runtime-visible comment when reset() is called so it's clear the PID state isn't driving control output yet

…1/3)

… commit 2/3)

…mmit 3/3)

snktshrma changed the title ~~feat: Ardupilot-Gazebo drone simulation with video stream and in a warehouse environment~~ feat: Ardupilot support for hardware and Gazebo drone simulation with video stream and in a warehouse environment Mar 16, 2026

snktshrma force-pushed the drone-gazebo branch from e47b166 to 1476fa0 Compare March 16, 2026 14:55

snktshrma changed the title ~~feat: Ardupilot support for hardware and Gazebo drone simulation with video stream and in a warehouse environment~~ feat: Ardupilot support for Gazebo (+ hardware) drone simulation with video stream and in a warehouse environment Mar 16, 2026

snktshrma force-pushed the drone-gazebo branch 4 times, most recently from 3729b54 to 5e55505 Compare March 26, 2026 19:51

snktshrma marked this pull request as ready for review March 26, 2026 19:52

snktshrma added 3 commits March 27, 2026 01:29

feat: Ardupilot-Gazebo drone simulation with video stream for agentic…

a7d37fd

… tracking, spatial model and warehouse environment

Ardupilot: Allow tracking and follow object

9d3ccf0

feat: Added spatial memory and query to navigate back to memory locat…

3478b70

…ions

snktshrma force-pushed the drone-gazebo branch from 5e55505 to 3478b70 Compare March 26, 2026 19:59

greptile-apps bot reviewed Mar 26, 2026

View reviewed changes

ouazmourad added a commit to ouazmourad/dimos-21-days-sprint that referenced this pull request Mar 26, 2026

feat: Ardupilot-Gazebo drone sim (from PR dimensionalOS#1576, commit …

8e28616

…1/3)

ouazmourad added a commit to ouazmourad/dimos-21-days-sprint that referenced this pull request Mar 26, 2026

feat: Ardupilot tracking and follow object (from PR dimensionalOS#1576,…

886cb0d

… commit 2/3)

ouazmourad added a commit to ouazmourad/dimos-21-days-sprint that referenced this pull request Mar 26, 2026

feat: spatial memory query navigation (from PR dimensionalOS#1576, co…

873ab06

…mmit 3/3)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Ardupilot support for Gazebo (+ hardware) drone simulation with video stream and in a warehouse environment#1576

feat: Ardupilot support for Gazebo (+ hardware) drone simulation with video stream and in a warehouse environment#1576
snktshrma wants to merge 3 commits intodimensionalOS:devfrom
snktshrma:drone-gazebo

snktshrma commented Mar 16, 2026 •

edited

Loading

Uh oh!

uchibeke commented Mar 16, 2026

Uh oh!

greptile-apps bot commented Mar 26, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 26, 2026

Uh oh!

greptile-apps bot Mar 26, 2026

Uh oh!

greptile-apps bot Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -322,7 +322,7 @@ docker = [
		"pydantic-settings>=2.11.0,<3",

	"pydantic-settings>=2.11.0,<3",
	"opencv-contrib-python-headless==4.10.0.84",

Conversation

snktshrma commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

To-Do:

Uh oh!

uchibeke commented Mar 16, 2026

Uh oh!

greptile-apps bot commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

greptile-apps bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

snktshrma commented Mar 16, 2026 •

edited

Loading

greptile-apps bot commented Mar 26, 2026 •

edited

Loading