Sci-Agent-SPM

A user-facing automation agent for running Scanning Probe Microscope (SPM) experiments with a CLI-first control path:

Instrument CLI driver first: the agent prefers structured driver calls (cli_get, cli_set, cli_ramp, cli_action) for reliable control.
GUI as auxiliary path: anchors/ROIs remain available for long-tail operations that are not exposed by the driver.
Long-horizon execution: ReAct-based operation for multi-step experiments that can run stably for extended sessions.
Structured context + modular memory: persistent sessions, run memory, and automatic memory compression.

What It Can Do

Sci-Agent-SPM is designed for real lab workflows where core control should come from instrument drivers, while GUI automation remains available as a backup channel.

Run driver operations with policy guardrails using workspace-approved CLI parameters/actions (single-driver or multi-driver via explicit cli_name).
Verify outcomes automatically using linked observables across both channels (linked_observables).
Use GUI anchors/ROIs when needed for controls that are not yet mapped to a driver.
Wait like an operator with ROI-aware sleeps (wait_until) that can follow visible countdowns.
Run in different modes:
- react: ReAct loop automation with tools (CLI + GUI actions + post-action observation/think).
- plan_execute: two-pass fast execution (optional one-time precheck, then deterministic serial tool calls).
- chat: model-only reasoning and planning (no UI side effects).

How It Runs (Architecture)

At runtime, the agent runs a tight tool loop over instrument state and screen state:

CLI execute (primary): deterministic driver calls via src/cli_adapter.py (no free-form shell generation).
Capture/GUI act (auxiliary): ROI screenshots (src/capture.py) plus mouse/keyboard actions via pyautogui (src/actions.py).
Decide: two-model design (src/agent.py):
- agent_model: decides what to do next and updates structured memory.
- tool_call_model: cheap helper for ROI reading / waiting decisions.
Tools: exposed through an in-process MCP server so schemas are discoverable and the agent can “call tools” in a controlled way (src/mcp_server.py).
Memory: structured session memory with optional “keep last N turns” and on-demand / threshold-based compression (/compress_memory).

Quickstart (Recommended)

From the repo root, run the bootstrap script:

.\Sci-Agent-SPM.ps1

It will:

create .venv if needed
install dependencies from requirements.txt
create workspace.json from workspace.example.json (if missing)
load provider credentials from env (OPENAI_API, OPENAI_API_KEY, GEMINI_API_KEY, GOOGLE_API_KEY, ANTHROPIC_API_KEY, XAI_API_KEY, ARK_API_KEY)
load OPENAI_API / OPENAI_API_KEY from local .env if present
if no API credential is found but sessions/.provider_auth.json has OpenAI ChatGPT OAuth credentials, continue without prompting
otherwise prompt for an OpenAI key
start the TUI

Optional: install a user-level Sci-Agent-SPM command (copies a shim to ~\.local\bin and adds it to your user PATH):

.\tools\install_sci_agent_spm.ps1

Service Mode (SPM Agent Service + Hub)

This mode runs a long-lived HTTP service on the SPM machine and a local client hub on the user machine. The service is UI-free (no TUI), and the client provides a read-only TUI plus MCP tools for coding agents.

1) Start the SPM Agent Service (on the instrument machine)

.\Sci-Agent-SPM-Service.ps1 -Config service_config.json

If service_config.json or service_users.json is missing, the script will create them from the examples:

service_config.example.json
service_users.example.json

Update service_users.json with a new password hash before use. You can generate one with:

python -c "from src.service_auth import hash_password; print(hash_password('your_password_here'))"

Example service_users.json entry:

{
  "users": [
    {
      "username": "admin",
      "password_hash": "pbkdf2_sha256$100000$...$..."
    }
  ],
  "tokens": []
}

Notes:

Restart the service after editing service_users.json (loaded on startup).
To create a user, add a new object to users with a username and its password_hash (the hash does not include the username).
You can add multiple users; each user has its own password_hash (reuse only if you intend identical passwords).
tokens is an optional list of bearer tokens. If you add one, clients can authenticate with Authorization: Bearer <token> instead of username/password.

Server dashboard (user management + calibration tool):

http://<server-host>:59341/dashboard

The dashboard requires the same auth as the API. Calibration can only be launched from this dashboard.

2) Start the Client Hub (on the user machine)

Recommended (one command): copy sci_agent_client and Sci-Agent-SPM-Hub.ps1 to the user machine, then run:

.\Sci-Agent-SPM-Hub.ps1

It will:

create a local .venv and install client dependencies
prompt for server base URL + username/password
validate credentials (user not found vs wrong password)
save config to ~\.sci_agent_client\config.json
start the MCP hub (serve)

If a saved base URL is reachable, it reuses it; you can choose to reuse saved credentials or enter new ones.

Manual steps (advanced):

python -m venv .venv
.\.venv\Scripts\python -m pip install -r sci_agent_client\requirements.txt
.\.venv\Scripts\python -m sci_agent_client.cli config --base-url http://<server-host>:59341 --username <user> --password <pass>
.\.venv\Scripts\python -m sci_agent_client.cli serve

Optional convenience shims (Windows):

.\sci_agent_client\spm-hub.cmd serve

Note: the MCP hub must be running before OpenCode/Codex/Claude Code can connect; the coding agent cannot auto-start it by itself.

Workspace safety note: the server always uses its configured default workspace (default_workspace_path in service_config.json). Client submissions cannot override the workspace.

3) Read-only TUI monitor (local)

.\sci_agent_client\spm-tui.cmd --job <job_id>

4) MCP config example (OpenCode / Codex CLI / Claude Code)

{
  "mcpServers": {
    "spm-hub": {
      "url": "http://127.0.0.1:59351/sse"
    }
  }
}

Service HTTP API (Summary)

Endpoints:

POST /jobs -> create job (returns job_id + state)
GET /jobs/{job_id} -> status + timestamps
POST /jobs/{job_id}/cancel -> cancel
GET /jobs -> list recent jobs
GET /jobs/{job_id}/session -> full canonical session.json
GET /jobs/{job_id}/session_delta?after_rev=N -> incremental session deltas
GET /dashboard -> admin dashboard (user management + calibration)

Setup (Manual)

Prerequisites

Windows 10/11
Python 3.11+ on PATH
Instrument CLI driver executable(s) available on PATH (for example nqctl, cryocli)
Your SPM controlling software running on a stable monitor layout (needed for GUI fallback operations)
Provider credentials for your chosen /connect workflow (OpenAI ChatGPT OAuth or provider API key env vars)

1) Create a venv + install deps

python -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install -r requirements.txt

2) Configure your `workspace.json` (CLI first, GUI optional)

Your workspace is the policy boundary for what the agent can control. Start from CLI capabilities, then add GUI entries only where needed.

Option A (recommended): import capabilities from your instrument CLI driver via the calibrator:

.\.venv\Scripts\python -m src.calibrate_gui --workspace workspace.json

In service mode, use the calibrator menu "Set agent workspace" to update service_config.json (default_workspace_path).

In the calibrator:

Set CLI Name to your driver executable (for example nqctl)
Click Load Param From CLI to import supported CLI parameters/actions
Enable only the entries you want the agent to use
Add Linked Observables where you want automatic post-action verification
Click Save

Option B (optional): add GUI fallback anchors/ROIs in the calibrator:

Add/select an ROI or Anchor in the left list
Click Draw ROI box (drag a rectangle) or Pick anchor point (single click)
Add descriptions so the model can pick the correct GUI control
Link observables to anchors for automatic verification after GUI actions

Option C: edit JSON directly:

Copy workspace.example.json → workspace.json
Edit:
- tools.cli.enabled: turn CLI channel on/off
- tools.cli.parameters and tools.cli.actions: allowlisted driver capabilities
- per-entry CLI_Name: owning driver name/executable (supports multi-driver workspaces)
- rois / anchors: GUI fallback controls
- linked_observables: observables to observe after actions for verification

3) Optional: set provider API key env vars (API mode)

$env:OPENAI_API = "YOUR_KEY_HERE"

Other provider env vars (API mode):

Gemini: GEMINI_API_KEY or GOOGLE_API_KEY
Claude: ANTHROPIC_API_KEY
Grok: XAI_API_KEY
Doubao: ARK_API_KEY

The bootstrap script also loads OPENAI_API / OPENAI_API_KEY from a local .env file if present.

Run

.\.venv\Scripts\python -m src.main --agent

How To Use It

1) Treat CLI as the default control path

When a matching CLI capability is enabled in workspace, the agent will prefer CLI tools (cli_get, cli_set, cli_ramp, cli_action).

GUI tools are used only when:

the requested capability is not available through enabled CLI entries, or
your instruction is explicitly GUI-only.

2) Add anchors + ROIs only for long-tail GUI steps

For any operation that still requires the controller UI, provide:

an anchor like bias_input (where to click/type)
ROIs like bias_readout, scan_status, scan_time_count_down (what to verify / wait on)

Link the right observables to the right anchors to make verification automatic.

3) Give it a real operator command

In react mode, you can ask for sequences like:

“Set bias to 500 mV, start one topography scan, wait until status returns to <idle>, then set 400 mV and repeat.”

In plan_execute mode, use short, time-sensitive sequences that must run quickly with no observe/think between execution steps.

In chat mode, ask for planning, SOP drafting, or sanity checks without touching the controlling software.

4) Connect provider auth (slash command or command palette)

Use /connect to select provider + auth mode before runs. You can do this directly in input, or open command palette (Ctrl+P) and select /connect to complete provider/mode interactively.

Supported connect workflows:

/connect openai chatgpt/codex
/connect openai api
/connect gemini api
/connect claude api
/connect grok api
/connect doubao api

OpenAI ChatGPT fallback behavior:

If /connect openai chatgpt/codex fails, the TUI shows a fallback prompt:
- 1 retry OpenAI ChatGPT connect
- 2 switch to OpenAI API mode (/connect openai api)
- 3 cancel

5) TUI controls (slash commands)

In the TUI, type /help (or /menu) to show commands. Settings persist in sessions/.tui_settings.json.

Core settings

/workspace [path]: get/set the active workspace file.
/mode: show current mode.
/mode chat|react|plan_execute: set execution mode.
/agent_model [name] (alias: /model): get/set the main “decision” model.
/tool_call_model [name]: get/set the perception/tool helper model.
/max_agent_steps [int]: limit steps per run (prevents runaway loops).
/action_delay [seconds]: delay between UI actions (stability vs speed).
/abort_hotkey [on|off]: enable/disable cooperative abort from the TUI (Ctrl+C).
/log_dir [path]: set where runs write logs (default: logs).

Memory

/memory_turn: show current memory_turns.
/set_memory_turn [-1|N]: -1 = full memory; 0 = none; N = keep last N entries.
/memory_compress_threshold [int tokens]: set auto-compress threshold (0 disables auto-compress).
/compress_memory: manually compress memory now (moves details to archive and keeps summaries).

Sessions

/chat new: start a new session (clears transcript + memory).
/chat save [name]: save current transcript + agent state to sessions/<name>.json.
/chat list: list saved sessions.
/chat resume <name>: load a saved session.

Provider auth

/connect: show current provider auth status summary.
/connect <openai|gemini|claude|grok|doubao> <api|chatgpt/codex>: connect provider mode (chatgpt/codex is only valid for openai).

Maintenance

/calibration_tool: launch the calibrator for the current workspace.
/clear_cache: delete log folders on disk (asks for confirmation).

6) TUI key bindings

Enter: newline
Ctrl+S: send input
Ctrl+I: focus input
Ctrl+L: focus transcript
Ctrl+P: open command palette
Shift+Mouse: select/copy transcript (Esc to close)
Ctrl+Q: quit
PageUp / PageDown: scroll transcript
Ctrl+C: request abort (when /abort_hotkey is ON)

7) Logs (audit trail)

Each program run creates a timestamped folder under logs/ (or your configured /log_dir).

Typical structure:

logs/<YYYYMMDD_HHMMSS>/

cli_get_<parameter>/meta.json
cli_set_<parameter>/meta.json
cli_ramp_<parameter>/meta.json
cli_action_<action_name>/meta.json
click_<anchor>/meta.json
set_field_<anchor>/meta.json
wait_until_<roi>_sleep/meta.json + before_<roi>.png
observe_<attempt>_<roi>/meta.json + roi_<roi>.png

These logs are designed to make automation reviewable: you can inspect exactly what the agent saw and did.

Safety

This project can control instrument state (CLI) and your mouse/keyboard (GUI fallback).

PyAutoGUI failsafe: move the mouse to the top-left corner to trigger pyautogui.FAILSAFE.
Prefer running on a dedicated machine / dedicated desktop session so other apps don’t steal focus.
Start with conservative /action_delay and small /max_agent_steps until your workspace is calibrated and stable.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
assets		assets
docs/plans		docs/plans
sci_agent_client		sci_agent_client
scripts		scripts
src		src
tests		tests
tools		tools
workspaces		workspaces
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
PLAN.md		PLAN.md
README.md		README.md
README.pdf		README.pdf
Sci-Agent-SPM-Hub.ps1		Sci-Agent-SPM-Hub.ps1
Sci-Agent-SPM-Service.ps1		Sci-Agent-SPM-Service.ps1
Sci-Agent-SPM.cmd		Sci-Agent-SPM.cmd
Sci-Agent-SPM.ps1		Sci-Agent-SPM.ps1
model_providers_documents.md		model_providers_documents.md
model_providers_documents.pdf		model_providers_documents.pdf
requirements.txt		requirements.txt
service_config.example.json		service_config.example.json
service_users.example.json		service_users.example.json
tmp_cli_negative_probe.py		tmp_cli_negative_probe.py
tmp_cli_negative_real_probe.py		tmp_cli_negative_real_probe.py
workspace.example.json		workspace.example.json
workspace.json		workspace.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sci-Agent-SPM

What It Can Do

How It Runs (Architecture)

Quickstart (Recommended)

Service Mode (SPM Agent Service + Hub)

1) Start the SPM Agent Service (on the instrument machine)

2) Start the Client Hub (on the user machine)

3) Read-only TUI monitor (local)

4) MCP config example (OpenCode / Codex CLI / Claude Code)

Service HTTP API (Summary)

Setup (Manual)

Prerequisites

1) Create a venv + install deps

2) Configure your `workspace.json` (CLI first, GUI optional)

3) Optional: set provider API key env vars (API mode)

Run

How To Use It

1) Treat CLI as the default control path

2) Add anchors + ROIs only for long-tail GUI steps

3) Give it a real operator command

4) Connect provider auth (slash command or command palette)

5) TUI controls (slash commands)

6) TUI key bindings

7) Logs (audit trail)

Safety

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sci-Agent-SPM

What It Can Do

How It Runs (Architecture)

Quickstart (Recommended)

Service Mode (SPM Agent Service + Hub)

1) Start the SPM Agent Service (on the instrument machine)

2) Start the Client Hub (on the user machine)

3) Read-only TUI monitor (local)

4) MCP config example (OpenCode / Codex CLI / Claude Code)

Service HTTP API (Summary)

Setup (Manual)

Prerequisites

1) Create a venv + install deps

2) Configure your workspace.json (CLI first, GUI optional)

3) Optional: set provider API key env vars (API mode)

Run

How To Use It

1) Treat CLI as the default control path

2) Add anchors + ROIs only for long-tail GUI steps

3) Give it a real operator command

4) Connect provider auth (slash command or command palette)

5) TUI controls (slash commands)

6) TUI key bindings

7) Logs (audit trail)

Safety

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2) Configure your `workspace.json` (CLI first, GUI optional)

Packages