Feature request: Capture contextual screenshots for deictic references like “here” or “right here”
Summary
Clicky should automatically capture and pass a fresh screenshot when the user uses contextual/deictic language such as “here,” “right here,” “this,” “that,” “over there,” or similar phrases that point to something on screen.
Current behavior
- Clicky receives the initial screenshot context when a request starts.
- If the user verbally points at something with language like “right here,” the agent may not get a new screenshot at the exact moment of reference.
- This can make the request ambiguous, especially when the user is amplifying or clarifying a specific visible UI element.
Expected behavior
- When Clicky detects visual-reference phrases like “here,” “right here,” “this button,” “that thing,” or “over there,” it should capture another screenshot immediately.
- The fresh screenshot should be included with the next agent/assistant turn as additional context.
- This should work during both normal companion conversations and Clicky Agent/Codex handoffs.
- The new screenshot should be treated as additive context, not a replacement for the transcript or prior screenshots.
Example
User says:
Make this look more like the screenshot here.
or:
No, I mean right here — this part.
Clicky should take a new screenshot at that moment and include it in the model context so the assistant can resolve what “this” or “right here” refers to.
Why this matters
Users naturally refer to things on screen with spatial language. Capturing a new screenshot when those phrases appear would make Clicky feel more context-aware, reduce clarification loops, and make visual editing/debugging tasks much more reliable.
Possible implementation direction
- Add lightweight detection for deictic/visual-reference phrases in the voice transcript.
- Trigger an immediate screenshot capture when the phrase appears, especially if the user is correcting, pointing, comparing, or emphasizing something.
- Attach the new screenshot to the active model request or the next turn payload.
- Consider rate-limiting or debouncing so repeated phrases do not spam screenshots.
- Preserve ordering metadata so the model can distinguish the original screenshot from the “right here” screenshot.
Feature request: Capture contextual screenshots for deictic references like “here” or “right here”
Summary
Clicky should automatically capture and pass a fresh screenshot when the user uses contextual/deictic language such as “here,” “right here,” “this,” “that,” “over there,” or similar phrases that point to something on screen.
Current behavior
Expected behavior
Example
User says:
or:
Clicky should take a new screenshot at that moment and include it in the model context so the assistant can resolve what “this” or “right here” refers to.
Why this matters
Users naturally refer to things on screen with spatial language. Capturing a new screenshot when those phrases appear would make Clicky feel more context-aware, reduce clarification loops, and make visual editing/debugging tasks much more reliable.
Possible implementation direction