Skip to content

v0.9.5

Choose a tag to compare

@adi-wan-askui adi-wan-askui released this 29 Jul 16:10
· 703 commits to main since this release

What's Changed

🚀 New Features

  • Computer Tool support of "cursor_position" action": VisionAgent can now retrieve the current cursor position, e.g., to answer questions like Is the cursor currently hovering over the star icon button?

  • Multiple Display Support: Comprehensive multi-display functionality for computer vision agents

    • New Display Management Tools ⟶ e.g., VisionAgent searches now through all available displays if something cannot be found on one:
      • ListDisplaysTool: List all available displays with their properties
      • SetActiveDisplayTool: Set the active display for screenshots and actions
      • RetrieveActiveDisplayTool: Get information about the currently active display
  • AskUI Controller Path Configuration: Enhanced flexibility in controller setup

    • Direct Path Setting: New ASKUI_CONTROLLER_PATH environment variable for direct controller executable specification
    • Priority-Based Resolution: Controller path resolution with precedence: direct path > component registry > installation directory
    • Cross-Platform Support: Improved path resolution for Windows, macOS, and Linux
    • Better Error Handling: Clear error messages for missing or invalid controller paths
  • Modernized GRPC Controller Architecture: Major overhaul of the controller communication system

    • JSON Schema Integration: New JSON schema definitions for AgentOS-Send-Request-2501 and AgentOS-Send-Response-2501
    • Automated Code Generation:
      • New grpc:gen and json:gen PDM scripts for automated code generation
      • Generated Python classes from JSON schemas using datamodel-code-generator
      • Updated GRPC bindings with enhanced type safety
    • Enhanced Command Helpers: New command_helpers.py module with functions for:
      • Mouse position management (create_get_mouse_position_command, create_set_mouse_position_command)
      • Render object management (quad, line, image, text commands)
      • Styling system with create_style function supporting CSS-like properties
    • Improved Proto Definitions: Updated Controller_V1.proto with expanded command support

🔧 Improvements

  • Development Workflow Enhancements:
    • New Dev Dependency Group: Separated development dependencies (datamodel-code-generator, grpcio-tools) into dedicated group
    • Automated Generation Scripts: New scripts/grpc-gen.sh for GRPC code generation

🐞 Bug Fixes

  • Unicode Encoding: Fixed encoding in chat api persistence layer

🔄 Dependencies

  • Moved to Dev Dependencies:
    • grpcio-tools>=1.73.1: Moved from core to dev dependencies (was >=1.67.0)
    • datamodel-code-generator>=0.31.2: Added for automated code generation

Full Changelog: v0.9.4...v0.9.5