Skip to content

Capture contextual screenshots for references like “here” or “right here” #77

@technohdyt

Description

@technohdyt

Feature request: Capture contextual screenshots for deictic references like “here” or “right here”

Summary

Clicky should automatically capture and pass a fresh screenshot when the user uses contextual/deictic language such as “here,” “right here,” “this,” “that,” “over there,” or similar phrases that point to something on screen.

Current behavior

  • Clicky receives the initial screenshot context when a request starts.
  • If the user verbally points at something with language like “right here,” the agent may not get a new screenshot at the exact moment of reference.
  • This can make the request ambiguous, especially when the user is amplifying or clarifying a specific visible UI element.

Expected behavior

  • When Clicky detects visual-reference phrases like “here,” “right here,” “this button,” “that thing,” or “over there,” it should capture another screenshot immediately.
  • The fresh screenshot should be included with the next agent/assistant turn as additional context.
  • This should work during both normal companion conversations and Clicky Agent/Codex handoffs.
  • The new screenshot should be treated as additive context, not a replacement for the transcript or prior screenshots.

Example

User says:

Make this look more like the screenshot here.

or:

No, I mean right here — this part.

Clicky should take a new screenshot at that moment and include it in the model context so the assistant can resolve what “this” or “right here” refers to.

Why this matters

Users naturally refer to things on screen with spatial language. Capturing a new screenshot when those phrases appear would make Clicky feel more context-aware, reduce clarification loops, and make visual editing/debugging tasks much more reliable.

Possible implementation direction

  • Add lightweight detection for deictic/visual-reference phrases in the voice transcript.
  • Trigger an immediate screenshot capture when the phrase appears, especially if the user is correcting, pointing, comparing, or emphasizing something.
  • Attach the new screenshot to the active model request or the next turn payload.
  • Consider rate-limiting or debouncing so repeated phrases do not spam screenshots.
  • Preserve ordering metadata so the model can distinguish the original screenshot from the “right here” screenshot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions