Skip to content

feat: Ardupilot support for Gazebo (+ hardware) drone simulation with video stream and in a warehouse environment#1576

Open
snktshrma wants to merge 3 commits intodimensionalOS:devfrom
snktshrma:drone-gazebo
Open

feat: Ardupilot support for Gazebo (+ hardware) drone simulation with video stream and in a warehouse environment#1576
snktshrma wants to merge 3 commits intodimensionalOS:devfrom
snktshrma:drone-gazebo

Conversation

@snktshrma
Copy link
Copy Markdown

@snktshrma snktshrma commented Mar 16, 2026

  • Gazebo + ArduPilot SITL: Gazebo video stream (RTP from UDP 5600), configurable as forward-facing camera source in connection module. New blueprints: basic, basic-with-spatial, agentic; all registered.
  • Position & motion: MAVLink position target in local NED with velocity feedforward (type mask fixed).
  • New agent skills: position target, move-by-distance, and yaw (rotate to heading).
  • Tracking skill in gazebo
  • Added spatial memory support and navigate using spatial memory skill
  • Sim: MAVLink uses local position NED when present.
  • Docs: README section for Gazebo + ArduPilot (SITL, plugin, gz sim, sim_vehicle.py, DimOS).
Screenshot from 2026-03-16 19-48-05

To-Do:

  • Add ardupiliot-gazebo support with dimos
  • Add agentic control and object tracking/follow
  • Add spatial memory support and navigate by query to spatial database
  • Add Navigation stack support / separate avoidance and planning algorithm
  • Do hardware test

@snktshrma snktshrma changed the title feat: Ardupilot-Gazebo drone simulation with video stream and in a warehouse environment feat: Ardupilot support for hardware and Gazebo drone simulation with video stream and in a warehouse environment Mar 16, 2026
@snktshrma snktshrma changed the title feat: Ardupilot support for hardware and Gazebo drone simulation with video stream and in a warehouse environment feat: Ardupilot support for Gazebo (+ hardware) drone simulation with video stream and in a warehouse environment Mar 16, 2026
@uchibeke
Copy link
Copy Markdown

This is a genuinely exciting project — natural language control for humanoids, quadrupeds, drones, and robotic arms is the kind of thing that makes the "agentic AI" category concrete in a way that purely software agents don't.

One thing that jumps out immediately from the architecture: physical hardware agents need pre-action authorization at a fundamentally higher assurance level than software agents. A hallucinated Jira write is recoverable. A hallucinated command to a robotic arm or drone is not.

The standard approach to this problem in software agents — prompt-layer instructions like "always ask before acting" — doesn't hold under adversarial conditions. Prompt injection can instruct an agent to skip confirmation steps. For physical hardware, that failure mode is unacceptable.

The pattern that works: before_tool_call hook enforcement

APort Agent Guardrails implements pre-action authorization at the platform hook level, not the prompt level. Every tool call is intercepted and evaluated against a YAML policy before it executes. The model cannot skip it — there's no prompt or agent response that bypasses the hook.

For physical agents in dimos, this maps to: capability scope enforcement before actuator commands reach hardware. You'd define a policy manifest for each robot/drone's authorized capabilities, and any tool call outside that scope is denied at the framework level before it propagates to the hardware adapter.

The underlying spec — the Open Agent Protocol (OAP), DOI: 10.5281/zenodo.18901596 — also defines agent passports: signed capability manifests that declare what an agent is authorized to do. In a multi-agent dimos workflow (say, a planner agent delegating to a hardware execution agent), passports give you chain-of-custody verification that the executing agent is actually scoped for physical actuation.

"When your agent commands a robotic arm, you need verified capability scope, not a prompt."

A few specifics for dimos:

  • No sidecar or infrastructure required — YAML policy + hook, zero overhead
  • Works at the framework level — integrates before commands reach hardware adapters
  • Public CTF results: 1,149 adversarial social-engineering attempts against the guardrail, 0 bypasses
  • Apache 2.0, open standard

The physical-hardware angle makes the authorization problem more urgent, not just more interesting. Happy to discuss how OAP might fit into the dimos architecture — whether as a framework-level gate before hardware commands, or as a passport layer for multi-agent physical workflows.

Repo: https://github.com/aporthq/aport-agent-guardrails
Spec: https://github.com/aporthq/aport-spec
DOI: https://doi.org/10.5281/zenodo.18901596

@snktshrma snktshrma force-pushed the drone-gazebo branch 4 times, most recently from 3729b54 to 5e55505 Compare March 26, 2026 19:51
@snktshrma snktshrma marked this pull request as ready for review March 26, 2026 19:52
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 26, 2026

Greptile Summary

This PR adds full Gazebo + ArduPilot SITL simulation support to the DimOS drone stack: a new GazeboVideoStream module decodes H264/RTP from UDP 5600 via GStreamer; three new blueprint variants (drone-basic-gazebo, drone-agentic-gazebo, drone-agentic-gazebo-spatial) wire this into the existing pipeline; and new agent skills (move_by_distance, go_to_position, rotate_to, navigate_to_where_i_saw) extend local NED navigation. The MAVLink layer now prefers LOCAL_POSITION_NED for indoor odometry, rotate_to switches from GPS heading to ATTITUDE.yaw, and the visual servoing controller is refactored from PID-based strafe to proportional forward-speed + yaw-rate + vz for a forward camera.\n\nKey changes and issues found:\n- P1follow_object was changed from a streaming generator to a fire-and-forget string return; the duration parameter no longer stops tracking after that many seconds — the drone tracks indefinitely regardless of what is passed.\n- P1 – The docker dependency extra replaced opencv-python-headless with opencv-contrib-python==4.10.0.84, a full GUI-linked build. Headless Docker containers may require additional system libraries that were not needed before.\n- P2DEFAULT_VERTICAL_ERROR_GAIN is 0.0012 in the class body but the __init__ docstring says "Default 0.0008".\n- P2x_pid, y_pid, z_pid are instantiated in DroneVisualServoingController.__init__ but the new compute_velocity_control never reads their output, creating dead code.

Confidence Score: 3/5

Safe for Gazebo simulation use; the follow_object duration contract and Docker headless dependency issues should be addressed before wider deployment.

The Gazebo SITL path is well-structured and the new skills work correctly in isolation. However, two P1 issues reduce confidence: the follow_object skill silently ignores the duration bound (a regression from the generator implementation), and the Docker dependency swap from headless to GUI-linked OpenCV may break CI/Docker deployments.

dimos/robot/drone/connection_module.py (follow_object duration regression), pyproject.toml (Docker opencv headless → contrib swap)

Important Files Changed

Filename Overview
dimos/robot/drone/connection_module.py Adds move_by_distance, go_to_position, rotate_to skills and integrates GazeboVideoStream. The follow_object skill was changed from a generator to a fire-and-forget string, which silently breaks the duration tracking contract.
dimos/robot/drone/mavlink_connection.py Adds move_by_distance_body_m, set_position_target, and rotate_to using ATTITUDE.yaw for heading; improves _publish_odom to prefer LOCAL_POSITION_NED over GPS in indoor mode; move_twist now includes yaw_rate support with correct type_mask.
dimos/robot/drone/drone_visual_servoing_controller.py Refactored from PID-based strafe control to proportional gain forward-speed + yaw-rate + vz control for forward camera. PID controllers are still instantiated but never used (dead code). DEFAULT_VERTICAL_ERROR_GAIN = 0.0012 but docstring says Default 0.0008.
dimos/robot/drone/gazebo_video_stream.py New file: GStreamer-based H264/RTP UDP decoder for Gazebo camera. _error_monitor thread may briefly block on readline() after stop. Overall structure is sound.
dimos/robot/drone/drone_spatial_nav_skill.py New skill: queries SpatialMemory CLIP index by text, converts stored world pose to local NED, and sends a position target. Coordinate transform logic is consistent with mavlink_connection internal axes.
dimos/robot/drone/drone_tracking_module.py Tracking now sends yaw_rate + vz instead of strafe vy; forward velocity zeroed after FORWARD_VELOCITY_DURATION seconds; **kwargs forwarded to super(). Logic is consistent with updated servoing controller.
pyproject.toml Replaces opencv-python + opencv-python-headless with opencv-contrib-python==4.10.0.84 everywhere including Docker. The Docker extra now links against GUI libraries where opencv-python-headless was previously used.
dimos/robot/drone/blueprints/agentic/drone_agentic.py Adds drone_agentic_gazebo and drone_agentic_gazebo_spatial blueprints; system prompt updated with clearer skill descriptions. Straightforward composition.
dimos/robot/drone/blueprints/basic/drone_basic.py Adds drone_basic_gazebo and drone_basic_gazebo_spatial blueprints composing DroneConnectionModule with video_source=gazebo and optional SpatialMemory. Clean composition.
dimos/robot/drone/specs.py New Protocol/Spec definitions for SpatialMemoryQuerySpec and DroneGoToPositionSpec. Clean structural typing with no issues.
dimos/robot/drone/test_drone.py New unit tests for move_by_distance delegation, NED fallback, and updated visual servoing assertions. Tests correctly reflect the new API contract.

Sequence Diagram

sequenceDiagram
    participant GZ as Gazebo SITL
    participant GVS as GazeboVideoStream
    participant GStreamer as gst-launch-1.0
    participant DCM as DroneConnectionModule
    participant MLC as MavlinkConnection
    participant DTM as DroneTrackingModule
    participant DSC as DroneVisualServoingController
    participant DSNS as DroneSpatialNavSkill
    participant SM as SpatialMemory

    GZ->>GStreamer: H264/RTP UDP:5600
    GStreamer->>GVS: raw RGB frames (stdout)
    GVS->>DCM: video Subject[Image]

    GZ->>MLC: MAVLink UDP:14550 (LOCAL_POSITION_NED, ATTITUDE)
    MLC->>DCM: telemetry + odometry

    DCM->>DTM: video frames
    DTM->>DSC: compute_velocity_control(target_x, target_y)
    DSC-->>DTM: (vx, vy=0, vz, yaw_rate)
    DTM->>DCM: Twist cmd_vel (move_twist)
    DCM->>MLC: SET_POSITION_TARGET_LOCAL_NED (body NED, vx+yaw_rate)
    MLC->>GZ: MAVLink velocity command

    Note over DSNS,SM: navigate_to_where_i_saw flow
    DSNS->>SM: query_by_text(description)
    SM-->>DSNS: [{metadata: {pos_x, pos_y, pos_z}}]
    DSNS->>DCM: go_to_position(ned_x, ned_y, ned_z)
    DCM->>MLC: set_position_target(x, y, z)
    MLC->>GZ: SET_POSITION_TARGET_LOCAL_NED (MAV_FRAME_LOCAL_NED)
Loading

Comments Outside Diff (2)

  1. dimos/robot/drone/connection_module.py, line 425-486 (link)

    P1 follow_object silently ignores the duration tracking contract

    The old implementation was a generator that would yield status updates and keep monitoring for the full duration seconds, then stop the drone when it expired. The new implementation:

    1. Sends the command
    2. Polls for at most min(duration, 30.0) seconds for an initial confirmation
    3. Returns a string — the drone then tracks indefinitely (or until stop_follow) regardless of the duration argument

    This means follow_object("person", duration=10) does not stop tracking after 10 seconds — it just caps the initial confirmation window at 10 s. An agent calling this with an explicit duration to bound tracking time will be silently surprised when tracking never stops on its own.

    The DroneTrackingModule.track_object method still respects duration <= 0 → run forever, so the agent has no way to pass a meaningful duration through the new skill API. Consider one of:

    • Sending a stop command after duration has elapsed (in a background thread)
    • Documenting clearly that duration only controls the initial status-wait window, not the tracking lifetime
    • Rename the parameter to status_timeout to avoid the confusion
  2. dimos/robot/drone/gazebo_video_stream.py, line 1007-1015 (link)

    P2 _error_monitor may hang briefly on readline() after stop()

    The loop guard checks not self._stop_event.is_set() before each iteration, but the actual blocking call self._process.stderr.readline() runs inside the loop after the guard has already passed. When stop() terminates the subprocess, the stderr pipe will eventually close and readline() returns b"", but there can be a delay if the OS buffers the pipe close.

    Because the thread is daemon=True this won't prevent process exit, but it can delay clean teardown during testing. A common pattern is to readline() first and break on empty bytes:

    def _error_monitor(self) -> None:
        while self._process and self._process.stderr:
            line = self._process.stderr.readline()
            if not line:
                break
            if self._stop_event.is_set():
                break
            s = line.decode("utf-8", errors="replace").strip()
            if "ERROR" in s or "WARNING" in s:
                logger.warning("GStreamer: %s", s)

Reviews (1): Last reviewed commit: "feat: Added spatial memory and query to ..." | Re-trigger Greptile

Comment on lines +31 to +57
DEFAULT_VERTICAL_ERROR_GAIN = 0.0012
MAX_VZ = 0.45 # m/s
# Gain: lateral pixel error -> yaw rate (rad/s). Object right of center -> positive yaw_rate (turn right).
DEFAULT_LATERAL_ERROR_TO_YAW_RATE = 0.001
MAX_YAW_RATE = 0.5 # rad/s

def __init__(
self,
x_pid_params: PIDParams,
y_pid_params: PIDParams,
z_pid_params: PIDParams | None = None,
forward_camera: bool = True,
forward_speed: float | None = None,
vertical_error_gain: float | None = None,
lateral_error_to_yaw_rate: float | None = None,
) -> None:
"""
Initialize drone visual servoing controller.

Args:
x_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for forward/back
y_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for left/right
z_pid_params: Optional params for altitude control
x_pid_params: Reserved for forward from image later.
y_pid_params: Reserved (lateral error drives yaw rate, not strafe).
z_pid_params: Optional; unused when using vertical_error_gain.
forward_camera: Reserved for later.
forward_speed: Constant vx (m/s). Default 0.2.
vertical_error_gain: Image vertical error (px) -> vz. Default 0.0008.
lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Default gain docstring mismatch

DEFAULT_VERTICAL_ERROR_GAIN is 0.0012 at the class level, but the __init__ docstring says the default is 0.0008. Any user who reads the docstring to understand the gain they're inheriting will configure their system based on the wrong value.

Suggested change
DEFAULT_VERTICAL_ERROR_GAIN = 0.0012
MAX_VZ = 0.45 # m/s
# Gain: lateral pixel error -> yaw rate (rad/s). Object right of center -> positive yaw_rate (turn right).
DEFAULT_LATERAL_ERROR_TO_YAW_RATE = 0.001
MAX_YAW_RATE = 0.5 # rad/s
def __init__(
self,
x_pid_params: PIDParams,
y_pid_params: PIDParams,
z_pid_params: PIDParams | None = None,
forward_camera: bool = True,
forward_speed: float | None = None,
vertical_error_gain: float | None = None,
lateral_error_to_yaw_rate: float | None = None,
) -> None:
"""
Initialize drone visual servoing controller.
Args:
x_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for forward/back
y_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for left/right
z_pid_params: Optional params for altitude control
x_pid_params: Reserved for forward from image later.
y_pid_params: Reserved (lateral error drives yaw rate, not strafe).
z_pid_params: Optional; unused when using vertical_error_gain.
forward_camera: Reserved for later.
forward_speed: Constant vx (m/s). Default 0.2.
vertical_error_gain: Image vertical error (px) -> vz. Default 0.0008.
lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001.
forward_speed: Constant vx (m/s). Default 0.2.
vertical_error_gain: Image vertical error (px) -> vz. Default 0.0012.
lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001.

@@ -322,7 +322,7 @@ docker = [
"pydantic-settings>=2.11.0,<3",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 opencv-python-headless replaced by GUI-linked opencv-contrib-python in Docker

The docker extra previously declared opencv-python-headless, which is a headless build specifically designed for server / Docker environments (no libGL, no libGTK, no display-related .so requirements). This PR replaces it with opencv-contrib-python==4.10.0.84, which is a full GUI build.

Consequences:

  • Headless Docker images without libGL.so.1 / GTK may get import-time warnings or ImportError for display-related modules.
  • opencv-contrib-python==4.10.0.84 is two major releases behind the opencv-python==4.13.0.92 still present in the lockfile, creating a version skew in the dependency graph that can produce subtle runtime behaviour differences.

If opencv-contrib modules (e.g. cv2.TrackerCSRT_create) are needed, consider keeping the headless variant or using a separate optional extra for GUI support:

Suggested change
"pydantic-settings>=2.11.0,<3",
"opencv-contrib-python-headless==4.10.0.84",

Comment on lines 37 to +71
def __init__(
self,
x_pid_params: PIDParams,
y_pid_params: PIDParams,
z_pid_params: PIDParams | None = None,
forward_camera: bool = True,
forward_speed: float | None = None,
vertical_error_gain: float | None = None,
lateral_error_to_yaw_rate: float | None = None,
) -> None:
"""
Initialize drone visual servoing controller.

Args:
x_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for forward/back
y_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for left/right
z_pid_params: Optional params for altitude control
x_pid_params: Reserved for forward from image later.
y_pid_params: Reserved (lateral error drives yaw rate, not strafe).
z_pid_params: Optional; unused when using vertical_error_gain.
forward_camera: Reserved for later.
forward_speed: Constant vx (m/s). Default 0.2.
vertical_error_gain: Image vertical error (px) -> vz. Default 0.0008.
lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001.
"""
self.x_pid = PIDController(*x_pid_params)
self.y_pid = PIDController(*y_pid_params)
self.z_pid = PIDController(*z_pid_params) if z_pid_params else None
self.forward_camera = forward_camera
self.forward_speed = forward_speed if forward_speed is not None else self.DEFAULT_FORWARD_SPEED
self.vertical_error_gain = (
vertical_error_gain if vertical_error_gain is not None else self.DEFAULT_VERTICAL_ERROR_GAIN
)
self.lateral_error_to_yaw_rate = (
lateral_error_to_yaw_rate
if lateral_error_to_yaw_rate is not None
else self.DEFAULT_LATERAL_ERROR_TO_YAW_RATE
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 PID controllers initialized but never used in compute_velocity_control

x_pid, y_pid, and z_pid are still instantiated but compute_velocity_control now uses only proportional gains — the PID state is never read during a tracking session. The docstring says the params are "reserved for future use", but dead code that is visibly constructed and reset (see reset()) can mislead maintainers into thinking tuning x_pid_params/y_pid_params affects tracking behaviour.

Consider either:

  • Removing the PID instantiation and reset() entries until the feature is actually implemented, or
  • Adding a runtime-visible comment when reset() is called so it's clear the PID state isn't driving control output yet

ouazmourad added a commit to ouazmourad/dimos-21-days-sprint that referenced this pull request Mar 26, 2026
ouazmourad added a commit to ouazmourad/dimos-21-days-sprint that referenced this pull request Mar 26, 2026
ouazmourad added a commit to ouazmourad/dimos-21-days-sprint that referenced this pull request Mar 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants