Skip to content

BB-84C/QUAIL-Bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

103 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sci-Agent-SPM

alt text

A user-facing automation agent for running Scanning Probe Microscope (SPM) experiments with a CLI-first control path:

  • Instrument CLI driver first: the agent prefers structured driver calls (cli_get, cli_set, cli_ramp, cli_action) for reliable control.
  • GUI as auxiliary path: anchors/ROIs remain available for long-tail operations that are not exposed by the driver.
  • Long-horizon execution: ReAct-based operation for multi-step experiments that can run stably for extended sessions.
  • Structured context + modular memory: persistent sessions, run memory, and automatic memory compression.

alt text

What It Can Do

Sci-Agent-SPM is designed for real lab workflows where core control should come from instrument drivers, while GUI automation remains available as a backup channel.

  • Run driver operations with policy guardrails using workspace-approved CLI parameters/actions (single-driver or multi-driver via explicit cli_name).
  • Verify outcomes automatically using linked observables across both channels (linked_observables).
  • Use GUI anchors/ROIs when needed for controls that are not yet mapped to a driver.
  • Wait like an operator with ROI-aware sleeps (wait_until) that can follow visible countdowns.
  • Run in different modes:
    • react: ReAct loop automation with tools (CLI + GUI actions + post-action observation/think).
    • plan_execute: two-pass fast execution (optional one-time precheck, then deterministic serial tool calls).
    • chat: model-only reasoning and planning (no UI side effects).

How It Runs (Architecture)

At runtime, the agent runs a tight tool loop over instrument state and screen state:

  • CLI execute (primary): deterministic driver calls via src/cli_adapter.py (no free-form shell generation).
  • Capture/GUI act (auxiliary): ROI screenshots (src/capture.py) plus mouse/keyboard actions via pyautogui (src/actions.py).
  • Decide: two-model design (src/agent.py):
    • agent_model: decides what to do next and updates structured memory.
    • tool_call_model: cheap helper for ROI reading / waiting decisions.
  • Tools: exposed through an in-process MCP server so schemas are discoverable and the agent can “call tools” in a controlled way (src/mcp_server.py).
  • Memory: structured session memory with optional “keep last N turns” and on-demand / threshold-based compression (/compress_memory).

Quickstart (Recommended)

From the repo root, run the bootstrap script:

.\Sci-Agent-SPM.ps1

It will:

  • create .venv if needed
  • install dependencies from requirements.txt
  • create workspace.json from workspace.example.json (if missing)
  • load provider credentials from env (OPENAI_API, OPENAI_API_KEY, GEMINI_API_KEY, GOOGLE_API_KEY, ANTHROPIC_API_KEY, XAI_API_KEY, ARK_API_KEY)
  • load OPENAI_API / OPENAI_API_KEY from local .env if present
  • if no API credential is found but sessions/.provider_auth.json has OpenAI ChatGPT OAuth credentials, continue without prompting
  • otherwise prompt for an OpenAI key
  • start the TUI

Optional: install a user-level Sci-Agent-SPM command (copies a shim to ~\.local\bin and adds it to your user PATH):

.\tools\install_sci_agent_spm.ps1

Service Mode (SPM Agent Service + Hub)

This mode runs a long-lived HTTP service on the SPM machine and a local client hub on the user machine. The service is UI-free (no TUI), and the client provides a read-only TUI plus MCP tools for coding agents.

1) Start the SPM Agent Service (on the instrument machine)

.\Sci-Agent-SPM-Service.ps1 -Config service_config.json

If service_config.json or service_users.json is missing, the script will create them from the examples:

  • service_config.example.json
  • service_users.example.json

Update service_users.json with a new password hash before use. You can generate one with:

python -c "from src.service_auth import hash_password; print(hash_password('your_password_here'))"

Example service_users.json entry:

{
  "users": [
    {
      "username": "admin",
      "password_hash": "pbkdf2_sha256$100000$...$..."
    }
  ],
  "tokens": []
}

Notes:

  • Restart the service after editing service_users.json (loaded on startup).
  • To create a user, add a new object to users with a username and its password_hash (the hash does not include the username).
  • You can add multiple users; each user has its own password_hash (reuse only if you intend identical passwords).
  • tokens is an optional list of bearer tokens. If you add one, clients can authenticate with Authorization: Bearer <token> instead of username/password.

Server dashboard (user management + calibration tool):

http://<server-host>:59341/dashboard

The dashboard requires the same auth as the API. Calibration can only be launched from this dashboard.

2) Start the Client Hub (on the user machine)

Recommended (one command): copy sci_agent_client and Sci-Agent-SPM-Hub.ps1 to the user machine, then run:

.\Sci-Agent-SPM-Hub.ps1

It will:

  • create a local .venv and install client dependencies
  • prompt for server base URL + username/password
  • validate credentials (user not found vs wrong password)
  • save config to ~\.sci_agent_client\config.json
  • start the MCP hub (serve)

If a saved base URL is reachable, it reuses it; you can choose to reuse saved credentials or enter new ones.

Manual steps (advanced):

python -m venv .venv
.\.venv\Scripts\python -m pip install -r sci_agent_client\requirements.txt
.\.venv\Scripts\python -m sci_agent_client.cli config --base-url http://<server-host>:59341 --username <user> --password <pass>
.\.venv\Scripts\python -m sci_agent_client.cli serve

Optional convenience shims (Windows):

.\sci_agent_client\spm-hub.cmd serve

Note: the MCP hub must be running before OpenCode/Codex/Claude Code can connect; the coding agent cannot auto-start it by itself.

Workspace safety note: the server always uses its configured default workspace (default_workspace_path in service_config.json). Client submissions cannot override the workspace.

3) Read-only TUI monitor (local)

.\sci_agent_client\spm-tui.cmd --job <job_id>

4) MCP config example (OpenCode / Codex CLI / Claude Code)

{
  "mcpServers": {
    "spm-hub": {
      "url": "http://127.0.0.1:59351/sse"
    }
  }
}

Service HTTP API (Summary)

States: queued | running | waiting | done | error | canceled

Endpoints:

  • POST /jobs -> create job (returns job_id + state)
  • GET /jobs/{job_id} -> status + timestamps
  • POST /jobs/{job_id}/cancel -> cancel
  • GET /jobs -> list recent jobs
  • GET /jobs/{job_id}/session -> full canonical session.json
  • GET /jobs/{job_id}/session_delta?after_rev=N -> incremental session deltas
  • GET /dashboard -> admin dashboard (user management + calibration)

Setup (Manual)

Prerequisites

  • Windows 10/11
  • Python 3.11+ on PATH
  • Instrument CLI driver executable(s) available on PATH (for example nqctl, cryocli)
  • Your SPM controlling software running on a stable monitor layout (needed for GUI fallback operations)
  • Provider credentials for your chosen /connect workflow (OpenAI ChatGPT OAuth or provider API key env vars)

1) Create a venv + install deps

python -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install -r requirements.txt

2) Configure your workspace.json (CLI first, GUI optional)

Your workspace is the policy boundary for what the agent can control. Start from CLI capabilities, then add GUI entries only where needed.

Option A (recommended): import capabilities from your instrument CLI driver via the calibrator:

.\.venv\Scripts\python -m src.calibrate_gui --workspace workspace.json

In service mode, use the calibrator menu "Set agent workspace" to update service_config.json (default_workspace_path).

In the calibrator:

  • Set CLI Name to your driver executable (for example nqctl)
  • Click Load Param From CLI to import supported CLI parameters/actions
  • Enable only the entries you want the agent to use
  • Add Linked Observables where you want automatic post-action verification
  • Click Save

Option B (optional): add GUI fallback anchors/ROIs in the calibrator:

  • Add/select an ROI or Anchor in the left list
  • Click Draw ROI box (drag a rectangle) or Pick anchor point (single click)
  • Add descriptions so the model can pick the correct GUI control
  • Link observables to anchors for automatic verification after GUI actions

Option C: edit JSON directly:

  1. Copy workspace.example.jsonworkspace.json
  2. Edit:
    • tools.cli.enabled: turn CLI channel on/off
    • tools.cli.parameters and tools.cli.actions: allowlisted driver capabilities
    • per-entry CLI_Name: owning driver name/executable (supports multi-driver workspaces)
    • rois / anchors: GUI fallback controls
    • linked_observables: observables to observe after actions for verification

3) Optional: set provider API key env vars (API mode)

$env:OPENAI_API = "YOUR_KEY_HERE"

Other provider env vars (API mode):

  • Gemini: GEMINI_API_KEY or GOOGLE_API_KEY
  • Claude: ANTHROPIC_API_KEY
  • Grok: XAI_API_KEY
  • Doubao: ARK_API_KEY

The bootstrap script also loads OPENAI_API / OPENAI_API_KEY from a local .env file if present.

Run

.\.venv\Scripts\python -m src.main --agent

How To Use It

1) Treat CLI as the default control path

When a matching CLI capability is enabled in workspace, the agent will prefer CLI tools (cli_get, cli_set, cli_ramp, cli_action).

GUI tools are used only when:

  • the requested capability is not available through enabled CLI entries, or
  • your instruction is explicitly GUI-only.

2) Add anchors + ROIs only for long-tail GUI steps

For any operation that still requires the controller UI, provide:

  • an anchor like bias_input (where to click/type)
  • ROIs like bias_readout, scan_status, scan_time_count_down (what to verify / wait on)

Link the right observables to the right anchors to make verification automatic.

3) Give it a real operator command

In react mode, you can ask for sequences like:

  • “Set bias to 500 mV, start one topography scan, wait until status returns to <idle>, then set 400 mV and repeat.”

In plan_execute mode, use short, time-sensitive sequences that must run quickly with no observe/think between execution steps.

In chat mode, ask for planning, SOP drafting, or sanity checks without touching the controlling software.

4) Connect provider auth (slash command or command palette)

Use /connect to select provider + auth mode before runs. You can do this directly in input, or open command palette (Ctrl+P) and select /connect to complete provider/mode interactively.

Supported connect workflows:

  • /connect openai chatgpt/codex
  • /connect openai api
  • /connect gemini api
  • /connect claude api
  • /connect grok api
  • /connect doubao api

OpenAI ChatGPT fallback behavior:

  • If /connect openai chatgpt/codex fails, the TUI shows a fallback prompt:
    • 1 retry OpenAI ChatGPT connect
    • 2 switch to OpenAI API mode (/connect openai api)
    • 3 cancel

5) TUI controls (slash commands)

In the TUI, type /help (or /menu) to show commands. Settings persist in sessions/.tui_settings.json.

Core settings

  • /workspace [path]: get/set the active workspace file.
  • /mode: show current mode.
  • /mode chat|react|plan_execute: set execution mode.
  • /agent_model [name] (alias: /model): get/set the main “decision” model.
  • /tool_call_model [name]: get/set the perception/tool helper model.
  • /max_agent_steps [int]: limit steps per run (prevents runaway loops).
  • /action_delay [seconds]: delay between UI actions (stability vs speed).
  • /abort_hotkey [on|off]: enable/disable cooperative abort from the TUI (Ctrl+C).
  • /log_dir [path]: set where runs write logs (default: logs).

Memory

  • /memory_turn: show current memory_turns.
  • /set_memory_turn [-1|N]: -1 = full memory; 0 = none; N = keep last N entries.
  • /memory_compress_threshold [int tokens]: set auto-compress threshold (0 disables auto-compress).
  • /compress_memory: manually compress memory now (moves details to archive and keeps summaries).

Sessions

  • /chat new: start a new session (clears transcript + memory).
  • /chat save [name]: save current transcript + agent state to sessions/<name>.json.
  • /chat list: list saved sessions.
  • /chat resume <name>: load a saved session.

Provider auth

  • /connect: show current provider auth status summary.
  • /connect <openai|gemini|claude|grok|doubao> <api|chatgpt/codex>: connect provider mode (chatgpt/codex is only valid for openai).

Maintenance

  • /calibration_tool: launch the calibrator for the current workspace.
  • /clear_cache: delete log folders on disk (asks for confirmation).

6) TUI key bindings

  • Enter: newline
  • Ctrl+S: send input
  • Ctrl+I: focus input
  • Ctrl+L: focus transcript
  • Ctrl+P: open command palette
  • Shift+Mouse: select/copy transcript (Esc to close)
  • Ctrl+Q: quit
  • PageUp / PageDown: scroll transcript
  • Ctrl+C: request abort (when /abort_hotkey is ON)

7) Logs (audit trail)

Each program run creates a timestamped folder under logs/ (or your configured /log_dir).

Typical structure:

logs/<YYYYMMDD_HHMMSS>/

  • cli_get_<parameter>/meta.json
  • cli_set_<parameter>/meta.json
  • cli_ramp_<parameter>/meta.json
  • cli_action_<action_name>/meta.json
  • click_<anchor>/meta.json
  • set_field_<anchor>/meta.json
  • wait_until_<roi>_sleep/meta.json + before_<roi>.png
  • observe_<attempt>_<roi>/meta.json + roi_<roi>.png

These logs are designed to make automation reviewable: you can inspect exactly what the agent saw and did.

Safety

This project can control instrument state (CLI) and your mouse/keyboard (GUI fallback).

  • PyAutoGUI failsafe: move the mouse to the top-left corner to trigger pyautogui.FAILSAFE.
  • Prefer running on a dedicated machine / dedicated desktop session so other apps don’t steal focus.
  • Start with conservative /action_delay and small /max_agent_steps until your workspace is calibrated and stable.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors