A user-facing automation agent for running Scanning Probe Microscope (SPM) experiments with a CLI-first control path:
- Instrument CLI driver first: the agent prefers structured driver calls (
cli_get,cli_set,cli_ramp,cli_action) for reliable control. - GUI as auxiliary path: anchors/ROIs remain available for long-tail operations that are not exposed by the driver.
- Long-horizon execution: ReAct-based operation for multi-step experiments that can run stably for extended sessions.
- Structured context + modular memory: persistent sessions, run memory, and automatic memory compression.
Sci-Agent-SPM is designed for real lab workflows where core control should come from instrument drivers, while GUI automation remains available as a backup channel.
- Run driver operations with policy guardrails using workspace-approved CLI parameters/actions (single-driver or multi-driver via explicit
cli_name). - Verify outcomes automatically using linked observables across both channels (
linked_observables). - Use GUI anchors/ROIs when needed for controls that are not yet mapped to a driver.
- Wait like an operator with ROI-aware sleeps (
wait_until) that can follow visible countdowns. - Run in different modes:
react: ReAct loop automation with tools (CLI + GUI actions + post-action observation/think).plan_execute: two-pass fast execution (optional one-time precheck, then deterministic serial tool calls).chat: model-only reasoning and planning (no UI side effects).
At runtime, the agent runs a tight tool loop over instrument state and screen state:
- CLI execute (primary): deterministic driver calls via
src/cli_adapter.py(no free-form shell generation). - Capture/GUI act (auxiliary): ROI screenshots (
src/capture.py) plus mouse/keyboard actions viapyautogui(src/actions.py). - Decide: two-model design (
src/agent.py):agent_model: decides what to do next and updates structured memory.tool_call_model: cheap helper for ROI reading / waiting decisions.
- Tools: exposed through an in-process MCP server so schemas are discoverable and the agent can “call tools” in a controlled way (
src/mcp_server.py). - Memory: structured session memory with optional “keep last N turns” and on-demand / threshold-based compression (
/compress_memory).
From the repo root, run the bootstrap script:
.\Sci-Agent-SPM.ps1It will:
- create
.venvif needed - install dependencies from
requirements.txt - create
workspace.jsonfromworkspace.example.json(if missing) - load provider credentials from env (
OPENAI_API,OPENAI_API_KEY,GEMINI_API_KEY,GOOGLE_API_KEY,ANTHROPIC_API_KEY,XAI_API_KEY,ARK_API_KEY) - load
OPENAI_API/OPENAI_API_KEYfrom local.envif present - if no API credential is found but
sessions/.provider_auth.jsonhas OpenAI ChatGPT OAuth credentials, continue without prompting - otherwise prompt for an OpenAI key
- start the TUI
Optional: install a user-level Sci-Agent-SPM command (copies a shim to ~\.local\bin and adds it to your user PATH):
.\tools\install_sci_agent_spm.ps1This mode runs a long-lived HTTP service on the SPM machine and a local client hub on the user machine. The service is UI-free (no TUI), and the client provides a read-only TUI plus MCP tools for coding agents.
.\Sci-Agent-SPM-Service.ps1 -Config service_config.jsonIf service_config.json or service_users.json is missing, the script will create them from the examples:
service_config.example.jsonservice_users.example.json
Update service_users.json with a new password hash before use. You can generate one with:
python -c "from src.service_auth import hash_password; print(hash_password('your_password_here'))"Example service_users.json entry:
{
"users": [
{
"username": "admin",
"password_hash": "pbkdf2_sha256$100000$...$..."
}
],
"tokens": []
}Notes:
- Restart the service after editing
service_users.json(loaded on startup). - To create a user, add a new object to
userswith ausernameand itspassword_hash(the hash does not include the username). - You can add multiple users; each user has its own
password_hash(reuse only if you intend identical passwords). tokensis an optional list of bearer tokens. If you add one, clients can authenticate withAuthorization: Bearer <token>instead of username/password.
Server dashboard (user management + calibration tool):
http://<server-host>:59341/dashboard
The dashboard requires the same auth as the API. Calibration can only be launched from this dashboard.
Recommended (one command): copy sci_agent_client and Sci-Agent-SPM-Hub.ps1 to the user machine, then run:
.\Sci-Agent-SPM-Hub.ps1It will:
- create a local
.venvand install client dependencies - prompt for server base URL + username/password
- validate credentials (user not found vs wrong password)
- save config to
~\.sci_agent_client\config.json - start the MCP hub (
serve)
If a saved base URL is reachable, it reuses it; you can choose to reuse saved credentials or enter new ones.
Manual steps (advanced):
python -m venv .venv
.\.venv\Scripts\python -m pip install -r sci_agent_client\requirements.txt
.\.venv\Scripts\python -m sci_agent_client.cli config --base-url http://<server-host>:59341 --username <user> --password <pass>
.\.venv\Scripts\python -m sci_agent_client.cli serveOptional convenience shims (Windows):
.\sci_agent_client\spm-hub.cmd serveNote: the MCP hub must be running before OpenCode/Codex/Claude Code can connect; the coding agent cannot auto-start it by itself.
Workspace safety note: the server always uses its configured default workspace (default_workspace_path in service_config.json). Client submissions cannot override the workspace.
.\sci_agent_client\spm-tui.cmd --job <job_id>{
"mcpServers": {
"spm-hub": {
"url": "http://127.0.0.1:59351/sse"
}
}
}States: queued | running | waiting | done | error | canceled
Endpoints:
POST /jobs-> create job (returnsjob_id+ state)GET /jobs/{job_id}-> status + timestampsPOST /jobs/{job_id}/cancel-> cancelGET /jobs-> list recent jobsGET /jobs/{job_id}/session-> full canonical session.jsonGET /jobs/{job_id}/session_delta?after_rev=N-> incremental session deltasGET /dashboard-> admin dashboard (user management + calibration)
- Windows 10/11
- Python 3.11+ on PATH
- Instrument CLI driver executable(s) available on PATH (for example
nqctl,cryocli) - Your SPM controlling software running on a stable monitor layout (needed for GUI fallback operations)
- Provider credentials for your chosen
/connectworkflow (OpenAI ChatGPT OAuth or provider API key env vars)
python -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install -r requirements.txtYour workspace is the policy boundary for what the agent can control. Start from CLI capabilities, then add GUI entries only where needed.
Option A (recommended): import capabilities from your instrument CLI driver via the calibrator:
.\.venv\Scripts\python -m src.calibrate_gui --workspace workspace.jsonIn service mode, use the calibrator menu "Set agent workspace" to update service_config.json (default_workspace_path).
In the calibrator:
- Set CLI Name to your driver executable (for example
nqctl) - Click Load Param From CLI to import supported CLI parameters/actions
- Enable only the entries you want the agent to use
- Add Linked Observables where you want automatic post-action verification
- Click Save
Option B (optional): add GUI fallback anchors/ROIs in the calibrator:
- Add/select an ROI or Anchor in the left list
- Click Draw ROI box (drag a rectangle) or Pick anchor point (single click)
- Add descriptions so the model can pick the correct GUI control
- Link observables to anchors for automatic verification after GUI actions
Option C: edit JSON directly:
- Copy
workspace.example.json→workspace.json - Edit:
tools.cli.enabled: turn CLI channel on/offtools.cli.parametersandtools.cli.actions: allowlisted driver capabilities- per-entry
CLI_Name: owning driver name/executable (supports multi-driver workspaces) rois/anchors: GUI fallback controlslinked_observables: observables to observe after actions for verification
$env:OPENAI_API = "YOUR_KEY_HERE"Other provider env vars (API mode):
- Gemini:
GEMINI_API_KEYorGOOGLE_API_KEY - Claude:
ANTHROPIC_API_KEY - Grok:
XAI_API_KEY - Doubao:
ARK_API_KEY
The bootstrap script also loads OPENAI_API / OPENAI_API_KEY from a local .env file if present.
.\.venv\Scripts\python -m src.main --agentWhen a matching CLI capability is enabled in workspace, the agent will prefer CLI tools (cli_get, cli_set, cli_ramp, cli_action).
GUI tools are used only when:
- the requested capability is not available through enabled CLI entries, or
- your instruction is explicitly GUI-only.
For any operation that still requires the controller UI, provide:
- an anchor like
bias_input(where to click/type) - ROIs like
bias_readout,scan_status,scan_time_count_down(what to verify / wait on)
Link the right observables to the right anchors to make verification automatic.
In react mode, you can ask for sequences like:
- “Set bias to 500 mV, start one topography scan, wait until status returns to
<idle>, then set 400 mV and repeat.”
In plan_execute mode, use short, time-sensitive sequences that must run quickly with no observe/think between execution steps.
In chat mode, ask for planning, SOP drafting, or sanity checks without touching the controlling software.
Use /connect to select provider + auth mode before runs. You can do this directly in input, or open command palette (Ctrl+P) and select /connect to complete provider/mode interactively.
Supported connect workflows:
/connect openai chatgpt/codex/connect openai api/connect gemini api/connect claude api/connect grok api/connect doubao api
OpenAI ChatGPT fallback behavior:
- If
/connect openai chatgpt/codexfails, the TUI shows a fallback prompt:1retry OpenAI ChatGPT connect2switch to OpenAI API mode (/connect openai api)3cancel
In the TUI, type /help (or /menu) to show commands. Settings persist in sessions/.tui_settings.json.
Core settings
/workspace [path]: get/set the active workspace file./mode: show current mode./mode chat|react|plan_execute: set execution mode./agent_model [name](alias:/model): get/set the main “decision” model./tool_call_model [name]: get/set the perception/tool helper model./max_agent_steps [int]: limit steps per run (prevents runaway loops)./action_delay [seconds]: delay between UI actions (stability vs speed)./abort_hotkey [on|off]: enable/disable cooperative abort from the TUI (Ctrl+C)./log_dir [path]: set where runs write logs (default:logs).
Memory
/memory_turn: show currentmemory_turns./set_memory_turn [-1|N]:-1= full memory;0= none;N= keep last N entries./memory_compress_threshold [int tokens]: set auto-compress threshold (0disables auto-compress)./compress_memory: manually compress memory now (moves details to archive and keeps summaries).
Sessions
/chat new: start a new session (clears transcript + memory)./chat save [name]: save current transcript + agent state tosessions/<name>.json./chat list: list saved sessions./chat resume <name>: load a saved session.
Provider auth
/connect: show current provider auth status summary./connect <openai|gemini|claude|grok|doubao> <api|chatgpt/codex>: connect provider mode (chatgpt/codexis only valid foropenai).
Maintenance
/calibration_tool: launch the calibrator for the current workspace./clear_cache: delete log folders on disk (asks for confirmation).
Enter: newlineCtrl+S: send inputCtrl+I: focus inputCtrl+L: focus transcriptCtrl+P: open command paletteShift+Mouse: select/copy transcript (Esc to close)Ctrl+Q: quitPageUp/PageDown: scroll transcriptCtrl+C: request abort (when/abort_hotkeyis ON)
Each program run creates a timestamped folder under logs/ (or your configured /log_dir).
Typical structure:
logs/<YYYYMMDD_HHMMSS>/
cli_get_<parameter>/meta.jsoncli_set_<parameter>/meta.jsoncli_ramp_<parameter>/meta.jsoncli_action_<action_name>/meta.jsonclick_<anchor>/meta.jsonset_field_<anchor>/meta.jsonwait_until_<roi>_sleep/meta.json+before_<roi>.pngobserve_<attempt>_<roi>/meta.json+roi_<roi>.png
These logs are designed to make automation reviewable: you can inspect exactly what the agent saw and did.
This project can control instrument state (CLI) and your mouse/keyboard (GUI fallback).
- PyAutoGUI failsafe: move the mouse to the top-left corner to trigger
pyautogui.FAILSAFE. - Prefer running on a dedicated machine / dedicated desktop session so other apps don’t steal focus.
- Start with conservative
/action_delayand small/max_agent_stepsuntil your workspace is calibrated and stable.

