AndroidStateProvider should apply screenshot resize + coordinate_scale when vision is enabled

## Problem

With `AndroidStateProvider` (`vision_only=False`), the LLM receives the raw native screenshot, such as 1080x2400, while `coordinate_scale_x/y` stays at 1.0. The vision model downscales the image internally, outputs scaled coordinates, and sends them to `click_at` without correction.

Example: the LLM estimates an X dismiss button at (651, 230), but the real device position is about (1020, 170). The ratio 651/1020 = 0.638 matches the model's apparent downscale factor. A retry at (976, 347) dismissed the button, and 976 is about 651 * 1.5, the inverse scale.

`ScreenshotOnlyStateProvider` already handles this. It resizes the screenshot with `fit_dimensions_to_max_side`, reports the resized dimensions in `formatted_text`, and sets `coordinate_scale_x = input_width / screen_width` so `UIState.convert_point` maps LLM coordinates back to device pixels.

`AndroidStateProvider` skips that logic. The a11y `click` path works when elements exist in the tree because coordinates come from element bounds. Some Compose overlays, tooltips, and popups are missing from the a11y tree, so the agent falls back to screenshot-based `click_at`. That path still uses `convert_point` with scale 1.0, so LLM coordinates land in the wrong device-pixel location.


## Reproduction

Use an Android device whose native resolution exceeds the model's internal processing resolution, which is true for most devices. Run with `vision_only=False` and either `fast_agent.vision=True` or `manager.vision=True`.

Steps:

- Start from a fresh Reddit installation.
- Open Reddit and navigate to the profile screen.
- Try to dismiss the tooltip.

The agent uses `click` for a11y-indexed elements, which works because those coordinates come from bounds. It falls back to `click_at` for Compose overlays, tooltips, or popups that are missing from the a11y tree. Those coordinate clicks are systematically off by the downscale factor.


## Suggested fix

When vision is enabled on any sub-agent, apply the same `fit_dimensions_to_max_side` and `coordinate_scale` logic that `ScreenshotOnlyStateProvider` already uses:

```python
# In AndroidStateProvider.get_state(), when vision is active:
screen_width, screen_height = fit_dimensions_to_max_side(native_width, native_height)
# ...
coordinate_scale_x = native_width / screen_width
coordinate_scale_y = native_height / screen_height
```

The `click` path resolves coordinates from element bounds, so it bypasses `convert_point` and is not affected. Only `click_at` and `click_area` go through `convert_point`, where the scale correction is needed.

The screenshot sent to the LLM should match the resized dimensions so the model's coordinate output aligns with the scale factor. The grid overlay from `resize_image_to_max_side_with_grid` could also be applied here. It is currently gated on `requires_coordinate_tools`, but resize plus scale is the critical fix.

## Workarounds attempted (from the scanner side)

- Added a coordinate grid overlay at the driver level. The grid labels appeared on the screenshot, but the LLM read the labels, mapped the button to the wrong grid cell, and still output the same wrong coordinates.
- Added explicit screen dimensions to the prompt (`device screen is 1080x2400 pixels`). The LLM acknowledged the dimensions and noted the scaling factor, but still output coordinates in its internal resolution.
- Reverted both changes because neither helps without coordinate scale correction.

<img width="560" height="1200" alt="Image" src="https://github.com/user-attachments/assets/2c7e7466-be22-4e24-a2cd-a9aaff999c16" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AndroidStateProvider should apply screenshot resize + coordinate_scale when vision is enabled #350

Problem

Reproduction

Suggested fix

Workarounds attempted (from the scanner side)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

AndroidStateProvider should apply screenshot resize + coordinate_scale when vision is enabled #350

Description

Problem

Reproduction

Suggested fix

Workarounds attempted (from the scanner side)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions