feat: Ardupilot support for Gazebo (+ hardware) drone simulation with video stream and in a warehouse environment#1576
Conversation
|
This is a genuinely exciting project — natural language control for humanoids, quadrupeds, drones, and robotic arms is the kind of thing that makes the "agentic AI" category concrete in a way that purely software agents don't. One thing that jumps out immediately from the architecture: physical hardware agents need pre-action authorization at a fundamentally higher assurance level than software agents. A hallucinated Jira write is recoverable. A hallucinated command to a robotic arm or drone is not. The standard approach to this problem in software agents — prompt-layer instructions like "always ask before acting" — doesn't hold under adversarial conditions. Prompt injection can instruct an agent to skip confirmation steps. For physical hardware, that failure mode is unacceptable. The pattern that works: APort Agent Guardrails implements pre-action authorization at the platform hook level, not the prompt level. Every tool call is intercepted and evaluated against a YAML policy before it executes. The model cannot skip it — there's no prompt or agent response that bypasses the hook. For physical agents in dimos, this maps to: capability scope enforcement before actuator commands reach hardware. You'd define a policy manifest for each robot/drone's authorized capabilities, and any tool call outside that scope is denied at the framework level before it propagates to the hardware adapter. The underlying spec — the Open Agent Protocol (OAP), DOI: 10.5281/zenodo.18901596 — also defines agent passports: signed capability manifests that declare what an agent is authorized to do. In a multi-agent dimos workflow (say, a planner agent delegating to a hardware execution agent), passports give you chain-of-custody verification that the executing agent is actually scoped for physical actuation.
A few specifics for dimos:
The physical-hardware angle makes the authorization problem more urgent, not just more interesting. Happy to discuss how OAP might fit into the dimos architecture — whether as a framework-level gate before hardware commands, or as a passport layer for multi-agent physical workflows. Repo: https://github.com/aporthq/aport-agent-guardrails |
3729b54 to
5e55505
Compare
… tracking, spatial model and warehouse environment
Greptile SummaryThis PR adds full Gazebo + ArduPilot SITL simulation support to the DimOS drone stack: a new Confidence Score: 3/5Safe for Gazebo simulation use; the follow_object duration contract and Docker headless dependency issues should be addressed before wider deployment. The Gazebo SITL path is well-structured and the new skills work correctly in isolation. However, two P1 issues reduce confidence: the follow_object skill silently ignores the duration bound (a regression from the generator implementation), and the Docker dependency swap from headless to GUI-linked OpenCV may break CI/Docker deployments. dimos/robot/drone/connection_module.py (follow_object duration regression), pyproject.toml (Docker opencv headless → contrib swap) Important Files Changed
Sequence DiagramsequenceDiagram
participant GZ as Gazebo SITL
participant GVS as GazeboVideoStream
participant GStreamer as gst-launch-1.0
participant DCM as DroneConnectionModule
participant MLC as MavlinkConnection
participant DTM as DroneTrackingModule
participant DSC as DroneVisualServoingController
participant DSNS as DroneSpatialNavSkill
participant SM as SpatialMemory
GZ->>GStreamer: H264/RTP UDP:5600
GStreamer->>GVS: raw RGB frames (stdout)
GVS->>DCM: video Subject[Image]
GZ->>MLC: MAVLink UDP:14550 (LOCAL_POSITION_NED, ATTITUDE)
MLC->>DCM: telemetry + odometry
DCM->>DTM: video frames
DTM->>DSC: compute_velocity_control(target_x, target_y)
DSC-->>DTM: (vx, vy=0, vz, yaw_rate)
DTM->>DCM: Twist cmd_vel (move_twist)
DCM->>MLC: SET_POSITION_TARGET_LOCAL_NED (body NED, vx+yaw_rate)
MLC->>GZ: MAVLink velocity command
Note over DSNS,SM: navigate_to_where_i_saw flow
DSNS->>SM: query_by_text(description)
SM-->>DSNS: [{metadata: {pos_x, pos_y, pos_z}}]
DSNS->>DCM: go_to_position(ned_x, ned_y, ned_z)
DCM->>MLC: set_position_target(x, y, z)
MLC->>GZ: SET_POSITION_TARGET_LOCAL_NED (MAV_FRAME_LOCAL_NED)
|
| DEFAULT_VERTICAL_ERROR_GAIN = 0.0012 | ||
| MAX_VZ = 0.45 # m/s | ||
| # Gain: lateral pixel error -> yaw rate (rad/s). Object right of center -> positive yaw_rate (turn right). | ||
| DEFAULT_LATERAL_ERROR_TO_YAW_RATE = 0.001 | ||
| MAX_YAW_RATE = 0.5 # rad/s | ||
|
|
||
| def __init__( | ||
| self, | ||
| x_pid_params: PIDParams, | ||
| y_pid_params: PIDParams, | ||
| z_pid_params: PIDParams | None = None, | ||
| forward_camera: bool = True, | ||
| forward_speed: float | None = None, | ||
| vertical_error_gain: float | None = None, | ||
| lateral_error_to_yaw_rate: float | None = None, | ||
| ) -> None: | ||
| """ | ||
| Initialize drone visual servoing controller. | ||
|
|
||
| Args: | ||
| x_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for forward/back | ||
| y_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for left/right | ||
| z_pid_params: Optional params for altitude control | ||
| x_pid_params: Reserved for forward from image later. | ||
| y_pid_params: Reserved (lateral error drives yaw rate, not strafe). | ||
| z_pid_params: Optional; unused when using vertical_error_gain. | ||
| forward_camera: Reserved for later. | ||
| forward_speed: Constant vx (m/s). Default 0.2. | ||
| vertical_error_gain: Image vertical error (px) -> vz. Default 0.0008. | ||
| lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001. |
There was a problem hiding this comment.
Default gain docstring mismatch
DEFAULT_VERTICAL_ERROR_GAIN is 0.0012 at the class level, but the __init__ docstring says the default is 0.0008. Any user who reads the docstring to understand the gain they're inheriting will configure their system based on the wrong value.
| DEFAULT_VERTICAL_ERROR_GAIN = 0.0012 | |
| MAX_VZ = 0.45 # m/s | |
| # Gain: lateral pixel error -> yaw rate (rad/s). Object right of center -> positive yaw_rate (turn right). | |
| DEFAULT_LATERAL_ERROR_TO_YAW_RATE = 0.001 | |
| MAX_YAW_RATE = 0.5 # rad/s | |
| def __init__( | |
| self, | |
| x_pid_params: PIDParams, | |
| y_pid_params: PIDParams, | |
| z_pid_params: PIDParams | None = None, | |
| forward_camera: bool = True, | |
| forward_speed: float | None = None, | |
| vertical_error_gain: float | None = None, | |
| lateral_error_to_yaw_rate: float | None = None, | |
| ) -> None: | |
| """ | |
| Initialize drone visual servoing controller. | |
| Args: | |
| x_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for forward/back | |
| y_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for left/right | |
| z_pid_params: Optional params for altitude control | |
| x_pid_params: Reserved for forward from image later. | |
| y_pid_params: Reserved (lateral error drives yaw rate, not strafe). | |
| z_pid_params: Optional; unused when using vertical_error_gain. | |
| forward_camera: Reserved for later. | |
| forward_speed: Constant vx (m/s). Default 0.2. | |
| vertical_error_gain: Image vertical error (px) -> vz. Default 0.0008. | |
| lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001. | |
| forward_speed: Constant vx (m/s). Default 0.2. | |
| vertical_error_gain: Image vertical error (px) -> vz. Default 0.0012. | |
| lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001. |
| @@ -322,7 +322,7 @@ docker = [ | |||
| "pydantic-settings>=2.11.0,<3", | |||
There was a problem hiding this comment.
opencv-python-headless replaced by GUI-linked opencv-contrib-python in Docker
The docker extra previously declared opencv-python-headless, which is a headless build specifically designed for server / Docker environments (no libGL, no libGTK, no display-related .so requirements). This PR replaces it with opencv-contrib-python==4.10.0.84, which is a full GUI build.
Consequences:
- Headless Docker images without
libGL.so.1/ GTK may get import-time warnings orImportErrorfor display-related modules. opencv-contrib-python==4.10.0.84is two major releases behind theopencv-python==4.13.0.92still present in the lockfile, creating a version skew in the dependency graph that can produce subtle runtime behaviour differences.
If opencv-contrib modules (e.g. cv2.TrackerCSRT_create) are needed, consider keeping the headless variant or using a separate optional extra for GUI support:
| "pydantic-settings>=2.11.0,<3", | |
| "opencv-contrib-python-headless==4.10.0.84", |
| def __init__( | ||
| self, | ||
| x_pid_params: PIDParams, | ||
| y_pid_params: PIDParams, | ||
| z_pid_params: PIDParams | None = None, | ||
| forward_camera: bool = True, | ||
| forward_speed: float | None = None, | ||
| vertical_error_gain: float | None = None, | ||
| lateral_error_to_yaw_rate: float | None = None, | ||
| ) -> None: | ||
| """ | ||
| Initialize drone visual servoing controller. | ||
|
|
||
| Args: | ||
| x_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for forward/back | ||
| y_pid_params: (kp, ki, kd, output_limits, integral_limit, deadband) for left/right | ||
| z_pid_params: Optional params for altitude control | ||
| x_pid_params: Reserved for forward from image later. | ||
| y_pid_params: Reserved (lateral error drives yaw rate, not strafe). | ||
| z_pid_params: Optional; unused when using vertical_error_gain. | ||
| forward_camera: Reserved for later. | ||
| forward_speed: Constant vx (m/s). Default 0.2. | ||
| vertical_error_gain: Image vertical error (px) -> vz. Default 0.0008. | ||
| lateral_error_to_yaw_rate: Image lateral error (px) -> yaw_rate (rad/s). Default 0.001. | ||
| """ | ||
| self.x_pid = PIDController(*x_pid_params) | ||
| self.y_pid = PIDController(*y_pid_params) | ||
| self.z_pid = PIDController(*z_pid_params) if z_pid_params else None | ||
| self.forward_camera = forward_camera | ||
| self.forward_speed = forward_speed if forward_speed is not None else self.DEFAULT_FORWARD_SPEED | ||
| self.vertical_error_gain = ( | ||
| vertical_error_gain if vertical_error_gain is not None else self.DEFAULT_VERTICAL_ERROR_GAIN | ||
| ) | ||
| self.lateral_error_to_yaw_rate = ( | ||
| lateral_error_to_yaw_rate | ||
| if lateral_error_to_yaw_rate is not None | ||
| else self.DEFAULT_LATERAL_ERROR_TO_YAW_RATE | ||
| ) |
There was a problem hiding this comment.
PID controllers initialized but never used in
compute_velocity_control
x_pid, y_pid, and z_pid are still instantiated but compute_velocity_control now uses only proportional gains — the PID state is never read during a tracking session. The docstring says the params are "reserved for future use", but dead code that is visibly constructed and reset (see reset()) can mislead maintainers into thinking tuning x_pid_params/y_pid_params affects tracking behaviour.
Consider either:
- Removing the PID instantiation and
reset()entries until the feature is actually implemented, or - Adding a runtime-visible comment when
reset()is called so it's clear the PID state isn't driving control output yet
To-Do: