SAGE — Seismology AI-Guided Engine

Conversational AI Platform for Seismology Research

SAGE is a web-first AI workbench for seismology and geophysics. It brings chat-based Q&A, scientific analysis, parameter optimization, knowledge-base retrieval, OpenAI-style SKILLs, code execution, GMT/Python plotting, and paper writing into one interface. Users can upload data, papers, and notes, then let the system infer file roles, plan scientific questions, generate figures and tables, integrate evidence, and produce reproducible reports, Markdown papers, and LaTeX drafts.

The command-line tool seismic_cli.py is still available, but it is now mainly a scripting, batch-processing, and debugging interface. Daily workflows are best handled through the web UI.

Start Here: One-Command Launch With `sagectl.sh`

For most users, sagectl.sh is the only command you need to remember. It checks the Python environment, installs missing dependencies when needed, preserves your last selected LLM backend, starts the SAGE web app in the background, and opens the browser.

git clone --recurse-submodules https://github.com/cangyeone/sage.git
cd sage
chmod +x sagectl.sh
./sagectl.sh

Open the web UI at:

http://127.0.0.1:5010

What `sagectl.sh` Does

Finds a supported Python environment, preferring Python 3.9-3.12.
Installs required dependencies into .venv/ if the current environment is not ready.
Runs lightweight backend configuration without interrupting startup for knowledge-base rebuild prompts.
Keeps your previous backend choice, such as DeepSeek, instead of switching to Ollama just because Ollama is running.
Starts web_app/app.py as a background service and writes logs to .sage_runtime/logs/.
Stores SAGE runtime data in the project-local .seismicx/ directory, including config, run records, local model paths, workflows, and knowledge metadata.

Daily Commands

./sagectl.sh              # setup if needed, configure, start, open browser
./sagectl.sh status       # show service status, URL, runtime dir, and log path
./sagectl.sh logs         # tail the web server log
./sagectl.sh stop         # stop the background web app
./sagectl.sh start        # start the background web app
./sagectl.sh restart      # restart the web app
./sagectl.sh open         # open the web UI in your browser
./sagectl.sh doctor       # check Python, dependencies, port, and backend status

Useful Options

Use another port:

SAGE_PORT=5011 ./sagectl.sh

Run without opening a browser:

SAGE_AUTO_OPEN=0 ./sagectl.sh

Choose script output language. English is the default:

./sagectl.sh --lang en
./sagectl.sh --lang zh
SAGE_LANG=zh ./sagectl.sh

Use a custom project runtime directory. By default this is .seismicx/ inside the repository:

SAGE_HOME=/path/to/project/.seismicx ./sagectl.sh

If you use local Ollama models:

./sagectl.sh ollama-start
ollama pull qwen3:8b

SAGE is web-first: after the service is running, use /chat, /knowledge, /skills, /llm-settings, /science-analysis-agent, and /parameter-optimization-agent from the browser.

Start Here: One-Command Launch With sagectl.sh
Features Overview
System Architecture
Software Structure Documentation
Quick Start
Installation
Configuring LLM Backend
Web Interface
Project Coding Workspaces
Built-in Coding Agent
Desktop GUI Control
Command Line Tools (Advanced/Fallback)
Conversation Routing Mechanism
seismo_skill Skill System
seismo_script Workflow System
GMT Map Drawing
Core Modules Details
Directory Structure
Configuration Files
FAQ
Acknowledgements
Contact
License

Features Overview

Module	Current Role	Core Capabilities
Chat	Everyday entry point	Streaming chat, temporary PDF reading, RAG Q&A, web search, image/table understanding, project-aware code execution, GMT/Python plotting, and multi-SKILL use
Scientific Analysis Agent	Main research entry point	Traverses project folders, identifies data/papers/notes, searches local and online literature, proposes scientific questions, plans figures/tables, runs CodeEngine, and drafts Markdown/LaTeX papers
Parameter Optimization Agent	Workflow/model optimization	Lets users define modules, inputs/outputs, parameters, and objectives; LLM understands the workflow while CodeEngine implements, debugs, monitors, and saves optimization traces
Knowledge Base	Persistent knowledge layer	Ingests PDFs, Markdown, projects, and chat exports; combines BGE-M3/FAISS or fallback retrieval with keyword search; supports deletion and incremental updates
Skill System	Capability extension	Supports OpenAI-style folder SKILLs, built-in SKILLs, documentation-generated SKILLs, and academic research SKILLs; usable by Chat, Science Analysis, Parameter Optimization, and CodeEngine
CodeEngine	Execution and debugging core	Generates Python/GMT/Bash scripts, edits project files, runs unit/full tests, performs self-debug loops, and saves figures, tables, logs, and engineering plans
LLM/Config	Global settings	Configures Ollama, local models, online APIs, OpenAI-compatible APIs, web search providers, workspaces, coding backends, and multimodal capabilities
Seismology Toolkits	Domain tools	Phase picking, event association, polarity analysis, b-value statistics, waveform processing, GMT maps, and 3D terrain/velocity visualization

System Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                          Web UI                                      │
│ /chat  /science-analysis-agent  /parameter-optimization-agent         │
│ /knowledge  /skills  /config                                         │
└───────────────┬──────────────────────────────────────────────────────┘
                │ REST + SSE streaming
┌───────────────▼──────────────────────────────────────────────────────┐
│                         Agent Orchestration                          │
│ intent routing · project isolation · background jobs · stop/resume    │
│ evidence tracking · reviewer-style iteration · multilingual prompts   │
└───────┬───────────────┬──────────────────┬──────────────────────────┘
        │               │                  │
┌───────▼──────┐ ┌──────▼──────┐  ┌────────▼────────┐
│ CodeEngine   │ │ RAG/Search  │  │ Skill Loader    │
│ Python/GMT   │ │ BGE-M3/FAISS│  │ OpenAI-style    │
│ Bash/LaTeX   │ │ keywords    │  │ built-in/user   │
│ mini tests   │ │ OpenAlex... │  │ nested subskills│
└───────┬──────┘ └──────┬──────┘  └────────┬────────┘
        │               │                  │
┌───────▼───────────────▼──────────────────▼──────────────────────────┐
│                         LLM Backends                                  │
│ Ollama · OpenAI-compatible APIs · DeepSeek · SiliconFlow · DashScope  │
│ optional multimodal models for figure/table/image analysis            │
└───────┬──────────────────────────────────────────────────────────────┘
        │
┌───────▼──────────────────────────────────────────────────────────────┐
│                         Domain Toolkits                               │
│ pnsn phase picking · GMT · ObsPy · statistics · document extraction   │
│ science paper templates · parameter optimization workflows            │
└──────────────────────────────────────────────────────────────────────┘

Software Structure Documentation

The detailed software architecture and built-in coding-agent contract live in docs/ARCHITECTURE.md. That document explains the Chat/RAG/SKILL/CodeEngine layers, the engineering-plan flow, persisted engineering_plan.md files, debug-time plan revisions, and the unit-test validation rules used by the coding agent. CodeEngine artifacts are registered with their conversation; deleting a conversation or project cleans up recognized CodeEngine temporary run directories and related engineering_plan*.md files.

Quick Start

# 1. Clone the main repository and bundled submodules
git clone --recurse-submodules https://github.com/cangyeone/sage.git
cd sage

# If you already cloned without submodules, run:
# git submodule update --init --recursive

# 2. One-command setup: install dependencies, configure, start the web app in background
chmod +x sagectl.sh
./sagectl.sh

The default URL is http://127.0.0.1:5010. On first access, select or enter a model on the Config page, then save it to start using the full system.

Common controls:

./sagectl.sh status    # Show web, port and log status
./sagectl.sh logs      # Tail background logs
./sagectl.sh stop      # Stop the background web app
./sagectl.sh start     # Start it again
./sagectl.sh restart   # Restart

Use another port:

SAGE_PORT=5011 ./sagectl.sh start

If you use local Ollama:

./sagectl.sh ollama-start
ollama pull qwen3:8b

Manual fallback:

pip install -r requirements.txt
python web_app/app.py --port 5010

Installation

System Requirements

Resource	Minimum Requirements	Recommended Configuration
Operating System	macOS / Linux / Windows	macOS 13+ / Ubuntu 22.04+
Python	3.9	3.10 / 3.11 / 3.12 (avoid 3.13+ for now)
Memory (RAM)	8 GB	16 GB+ (for running local LLM)
Storage Space	5 GB	30 GB+ (models + knowledge base)
GPU	Optional	CUDA 11.8+ or Apple Metal (for accelerated inference)

Basic Installation

git clone https://github.com/cangyeone/sage.git
cd sage

# Complete installation (recommended)
pip install -r requirements.txt

# Or install parts on demand
pip install flask flask-cors                          # Web services
pip install obspy torch scipy numpy pandas            # Seismic data processing
pip install matplotlib plotly                         # Visualization
pip install FlagEmbedding faiss-cpu pdfminer.six PyMuPDF  # RAG Knowledge Base

pnsn Phase Picking Module Location

pnsn is a deep learning model library specifically for phase picking, developed by cangyeone. In SAGE, pnsn is managed as part of the OpenAI-style skill pnsn_phase_detection; the expected location is seismo_skill/skills/pnsn_phase_detection/pnsn/, so the code, configuration, and model files stay with the skill.

# Only needed if the skill-local pnsn folder is missing
git clone https://github.com/cangyeone/pnsn.git \
  seismo_skill/skills/pnsn_phase_detection/pnsn

The current pnsn repository does not provide a separate requirements.txt and does not need to be installed as a Python package for SAGE usage. Install SAGE's top-level dependencies with pip install -r requirements.txt; SAGE then directly calls files such as seismo_skill/skills/pnsn_phase_detection/pnsn/picker.py, fastlinker.py, and pickers/*.jit.

Directory Structure Confirmation:

sage/
├── seismo_skill/
│   └── skills/
│       └── pnsn_phase_detection/
│           ├── SKILL.md
│           └── pnsn/       ← Skill-local pnsn code and models
│               ├── picker.py
│               ├── fastlinker.py
│               ├── gammalink.py
│               ├── pickers/  ← JIT / ONNX model files
│               └── config/
├── web_app/
└── ...

Main models provided by pnsn:

Model	Purpose	Format
PhaseNet	P/S wave arrival picking	JIT / ONNX
EQTransformer	Event detection + phase picking integration	JIT / ONNX
JMA Picker	Picker based on JMA algorithm	JIT

RAG Dependencies

Knowledge base RAG functionality requires the tokenizers library, which on some systems requires Rust compilation environment:

# Install Rust (only needed when pip install reports compilation errors)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env

# Reinstall embedding models library
pip install FlagEmbedding sentence-transformers

# On first use, BGE-M3 model (~2 GB) will automatically download from HuggingFace.
# If HuggingFace fails and `modelscope` is installed, SAGE automatically tries
# ModelScope and stores the model at open_models/bge-m3.
# Domestic network can also set mirror:
export HF_ENDPOINT=https://hf-mirror.com

Alternative: Download BGE-M3 via ModelScope (recommended for users in China)

If HuggingFace is inaccessible, install ModelScope. SAGE will then automatically try the fallback download. You can also download the model manually:

pip install modelscope

modelscope download --model BAAI/bge-m3 --local_dir open_models/bge-m3

Then configure the local path in SAGE so it uses the downloaded model instead of downloading from the internet. There are two ways:

Option 1 — Web Interface (Recommended): Open the Knowledge Base page (/knowledge) → click the ⚙ gear icon next to "Embedding Model" → paste the absolute path (e.g. /Users/yourname/open_models/bge-m3) → click Save.

Option 2 — Edit config directly: Add an embedding section to ~/.seismicx/config.json:

{
  "llm": { "...": "..." },
  "embedding": {
    "model_path": "/Users/yourname/open_models/bge-m3"
  }
}

Leave model_path as an empty string or omit the field entirely to revert to HuggingFace auto-download. The setting takes effect on the next document build — no restart required.

Configuring LLM Backend

All AI functions require an LLM backend. The recommended path is Web Interface → Config, where you can configure local models, online APIs, web search providers, coding backends, and workspaces. Project-related settings are stored in the project directory when possible, making them easy to move or clean up; only a small set of user defaults is maintained globally.

Method 1: Ollama (Recommended, local, no internet required)

# 1. Install Ollama
# macOS / Linux:
curl -fsSL https://ollama.ai/install.sh | sh
# Or visit https://ollama.ai/download

# 2. Start service
ollama serve

# 3. Pull model (select based on VRAM / Memory)
ollama pull qwen3:8b         # ~6 GB, suitable for daily use
ollama pull qwen3:30b        # ~20 GB, comprehensive capabilities
ollama pull deepseek-r1:8b   # ~9 GB, strong reasoning capability
ollama pull llama3.3:latest  # ~40 GB, strong English capability

Select the model on the Config page and click "Save Configuration" to complete setup.

Method 2: Online API (OpenAI Compatible Format)

On the Config page → select "Custom API" and fill in:

Field	Example (DeepSeek)	Example (SiliconFlow)
API Base URL	`https://api.deepseek.com/v1`	`https://api.siliconflow.cn/v1`
API Key	`sk-xxxxxxxx`	`sk-xxxxxxxx`
Model Name	`deepseek-chat`	`Qwen/Qwen2.5-72B-Instruct`

Supports any OpenAI compatible interface, including DeepSeek, SiliconFlow, Moonshot (Moonshot), Alibaba Tongyi (DashScope), Zhipu GLM, Anthropic, etc.

Method 3: Command Line Configuration

# Ollama local model
python seismic_cli.py backend use ollama --model qwen3:30b

# Online API
python seismic_cli.py backend use online \
    --provider deepseek \
    --api-key sk-xxx \
    --model deepseek-chat

# View all backend status
python seismic_cli.py backend status

# Auto-detect available backends
python seismic_cli.py backend auto

Web Interface

After startup, visit http://127.0.0.1:5010. The web UI is the recommended primary entry point because it supports background jobs, real-time streaming, isolated chats/projects, persistent files, rich figure rendering, and bilingual UI.

Chat (/chat)

Use Chat for daily Q&A, paper reading, quick plotting, and small data-processing tasks. It routes each request to QA, RAG, web search, SKILLs, or CodeEngine as needed. Long-running jobs can continue while you switch pages; returning to the chat shows accumulated output and resumes live streaming.

Task	Example
Paper Q&A	Upload a PDF, then ask “What are the core methods and equations?”
Data processing	“Bandpass this mseed file at 1-10 Hz and mark phase picks.”
GMT/Python plotting	“Draw a China topography map with GMT and put a red star at the center.”
Web research	“Search for current seismology foundation models and list the sources.”
Multi-SKILL use	“Use the GMT docs skill and 3D terrain skill to draw Sichuan terrain.”

Uploaded chat files are temporary unless explicitly added to the knowledge base. Chats and projects can also be saved into the knowledge base for later retrieval.

Project Coding Workspaces

Chat projects can be connected to a real local project directory. In the Chat sidebar:

Click Projects → New or select an existing project.
Click Settings.
Fill Project path / coding workspace with an absolute local directory, for example:

/Users/you/Documents/GitHub/my-research-code

After this is set, executable coding requests in that project use the selected directory as the active workspace:

Code search, file discovery, repo maps, and symbol lookup run inside the project path.
Generated edit scripts modify files under that project directory, not the SAGE source tree.
Unit tests and validation run with the project path as the working directory.
External coding backends, when enabled, also receive the project path as their workspace.
Each chat session + project path gets an isolated CodeEngine context, so different projects do not leak code history into each other.

If no project path is set, SAGE falls back to its own repository root, which is useful when you are developing SAGE itself.

Typical project prompts:

帮我定位这个项目里读取 catalog 的代码，并给我修改建议。
实现一个 CSV catalog loader，写单元测试，然后运行相关测试。
重构这个项目的绘图模块，并跑全量测试。
Run the full test suite and fix any failures.

Built-in Coding Agent

The built-in coding agent is designed for engineering-style work, not only one-off scripts. For repository tasks it follows this loop:

Search files with rg, build a compact repo/symbol map, and choose relevant files.
Write an engineering_plan.md that records route, files, API details, unit tests, and validation commands.
Generate an edit-and-test script that can modify multiple files.
Run the script, inspect failures, and debug with the same skill/RAG/API references.
Run validation:
- py_compile for changed Python files.
- Targeted pytest for changed or related tests.
- Full validation when explicitly requested, such as 全量测试, 完整构建, all tests, or full build.
Save generated figures, output files, code, debug traces, and engineering plans as downloadable artifacts.

The agent can add, update, insert, or delete focused tests when the task requires it. Test deletion is allowed only when the test is obsolete, asserts wrong behavior, or is replaced by equivalent or better coverage.

For simple requests such as “给我一个计算 b 值的程序”, SAGE may return a code draft without executing it. For requests such as “帮我运行 /data/catalog.csv 并计算 b 值” or “修改上一张图的标题”, SAGE routes to executable coding.

Full-project validation is conservative and uses common project conventions:

Project type	Full validation command
Python project with `tests/`	`python -m pytest`
Node project with `package.json`	`npm test`, then `npm run build`

Projects with custom build systems should include the desired command in the prompt or project shared prompt.

Desktop GUI Control

SAGE includes a built-in GUI automation skill for tasks that explicitly need desktop control. Generated code can use:

from seismo_code.gui_automation import (
    backend_status, screenshot, click, drag, move_to,
    type_text, hotkey, scroll, GuiAutomationError,
)

Supported actions include screenshots, coordinate clicks, dragging, typing, hotkeys, and scrolling. The backend is selected automatically:

Preferred: optional pyautogui.
macOS screenshot fallback: screencapture; mouse fallback: cliclick if installed.
Linux X11 fallback: xdotool; screenshots via gnome-screenshot or ImageMagick import.

Notes:

macOS may require Accessibility and Screen Recording permissions.
Linux Wayland may block global mouse/keyboard automation.
Browser pages should normally use browser automation rather than pixel clicks.
Text/OCR clicking is not invented by the agent; when text targeting is unavailable, the agent should take a screenshot and use coordinates.

Scientific Analysis (/science-analysis-agent)

This is the main page for “given data and papers, produce a scientific analysis.” Put data, papers, notes, scripts, and templates under one project directory; the agent recursively traverses the directory and infers file roles.

Core workflow:

Identify data, field notes, papers, existing figures, and LaTeX/Markdown templates.
Use local literature, the knowledge base, web search, and SKILLs to propose testable scientific questions.
Let the LLM plan paper figures, tables, statistics, and falsification paths.
Run CodeEngine to parse data, compute statistics, draw figures, and debug failures.
Draft a Markdown paper from figures and evidence; optionally generate LaTeX/PDF.
Run strict reviewer-style self-critique loops to add/remove figures and refine claims.

Scientific Analysis is intended to move beyond data-quality summaries and organize work around “question → evidence → figures/tables → paper conclusions.”

Parameter Optimization (/parameter-optimization-agent)

Use this page to define optimizable workflows such as phase-picking model training, signal-processing parameter search, inversion parameter tuning, or custom scientific pipelines. Users define inputs, outputs, tunable parameters, and objectives; the LLM interprets the workflow while CodeEngine implements, runs, debugs, and monitors it. Optimization traces and results can feed back into Scientific Analysis.

Knowledge Base (/knowledge)

The knowledge base stores long-term retrievable materials: PDFs, Markdown documents, chat/project exports, scientific-analysis projects, and SKILL/RAG helper indexes generated from documentation. Retrieval combines vector search and keyword search; Chinese text can use jieba tokenization. BGE-M3 is preferred for embeddings, with lightweight fallback when unavailable.

Deletion and incremental updates are supported. Removing an item also cleans related RAG entries, project records, and generated SKILL links where applicable.

Skills (/skills)

The skill system uses OpenAI-style folder SKILLs and supports built-in skills plus user skills under seismo_skill/user_skills/. Files or folders under seismo_skill/docs/ can be converted into skills through the web UI. For large documentation sets, SAGE can first use RAG/vector clustering to group similar content into subskills/, then ask an LLM to standardize the notes, examples, and constraints.

Skills can be jointly used by Chat, Scientific Analysis, Parameter Optimization, and CodeEngine. Failed or obsolete generated skills can be deleted from the UI.

SeismicX-Cont Continuous Monitoring Skill

SAGE includes the OpenAI-style skill continuous_seismic_monitoring for the publish_mini continuous-waveform benchmark code. Large waveform data, picker weights, and generated benchmark outputs are intentionally not committed. Download the SeismicX-Cont data from ModelScope, then place it under publish_mini/data/ or inside your project data directory.

The skill provides three skill-local workflows that can be invoked from Chat, Scientific Analysis, Parameter Optimization, and CodeEngine:

continuous_dataset_build: convert user-provided waveform/catalog/station inputs into a continuous waveform dataset with HDF5/SQLite indexes.
continuous_picker_benchmark: run a provided picker, evaluate phase-pick precision/recall, optionally associate events, and report earthquake recall statistics.
continuous_waveform_detection: run continuous earthquake detection on waveform windows using the dataset/index artifacts from the dataset workflow.

Example prompts:

Use the continuous seismic monitoring workflow to build a continuous waveform dataset from my project data.
Benchmark this PhaseNet picker on the SeismicX-Cont mini dataset and report P/S recall plus event association recall.
Use the continuous waveform detection workflow to detect earthquakes from the continuous data in this project.

Config (/config)

The Config page manages models and system capabilities:

Ollama, local models, online OpenAI-compatible APIs.
DeepSeek, SiliconFlow, DashScope, Moonshot/Kimi, Zhipu, and custom APIs.
Web search providers such as OpenAlex, Semantic Scholar, arXiv, and custom services.
Chat/agent workspaces, extra authorized roots, coding backends, and multimodal capabilities.
Thinking visibility, RAG, web search, and figure/table parsing switches.

Command Line Tools (Advanced/Fallback)

seismic_cli.py remains available, but it is now intended for scripting, batch jobs, and debugging. For normal interaction, scientific analysis, parameter optimization, knowledge management, and skill management, use the web UI first.

Common commands:

# Inspect or auto-select model backend
python seismic_cli.py backend status
python seismic_cli.py backend auto

# Lightweight chat/code entry points
python seismic_cli.py chat
python seismic_cli.py run "bandpass /data/wave.mseed at 1-10 Hz and plot it"

# Batch phase picking and event association
python seismic_cli.py pick -i /data/seismic/2024/ --batch -o results/picks.csv
python seismic_cli.py associate -i results/picks.csv -s station_list.csv --method fastlink -o results/events.txt

# Seismic statistics scripts
python seismic_cli.py stats bvalue -i catalog.csv --mc auto
python seismic_cli.py stats report -i catalog.csv

Use the root launcher for the web service:

./sagectl.sh          # install deps, configure, start in background
./sagectl.sh status   # show status
./sagectl.sh logs     # tail logs
./sagectl.sh stop     # stop web service

Conversation Routing Mechanism

SAGE determines message intent primarily through a dedicated LLM router, not broad QA keyword rules. QA is deliberately low-priority: requests that create, modify, run, plot, test, or continue a previous coding result are routed to CodeEngine even if they contain words such as "how", "注意", "支持", or "explain".

Routing Flow

User message
   │
   ├─ Explicit execution/artifact/refinement guard
   │      ├─ paths, plotting, processing, GUI control, or file generation → code
   │      └─ previous CodeEngine result + "modify/refine/support/change" → code
   │
   └─ LLM routing call (max_tokens=10, approximately <1s)
          │
          ├─ code_draft → write code only, without execution
          ├─ code  → CodeEngine generates and executes Python / GMT code
          ├─ chain → paper-method extraction followed by implementation
          ├─ qa    → RAG retrieves knowledge base + LLM response
          └─ chat  → General conversation

Routing Types

Route	Trigger Condition	Example
`code_draft`	User only asks for a program/script and does not ask SAGE to run it	"给我一个计算 b 值的程序"
`code`	Data processing, plotting, file operations, GUI control, repo edits, tests, builds	"帮我用 Python 绘制中国地形图", "把上一张图标题改成中文", "运行全量测试并修复失败"
`chain`	Uploaded paper/literature method reproduction	"复现这篇论文的方法并实现代码"
`qa`	Explicit concept explanation, summary, or method discussion with no requested output change	"What is Q-filter?", "Explain the principle of HVSR"
`chat`	Greetings, chatting, non-seismological content	"Hello", "How is the weather today"

Fallback rules when LLM routing is unavailable:

Strong code guards still route to code: paths, artifact creation, GUI control, or refinement of a previous CodeEngine result.
Otherwise SAGE falls back to chat, not QA. This avoids accidentally turning code/figure modification requests into explanatory answers.

seismo_skill Skill System

The skill system is SAGE's core extension mechanism. Each skill is a Markdown document describing function usage and code examples. Skills documents are automatically retrieved and injected during AI conversation and code generation, significantly improving the accuracy and standardization of generated code.

Working Principle

User message (natural language)
       │
       ▼
  seismo_skill keyword retrieval
  (Chinese-English mixed TF-IDF scoring)
       │
       ├─ Matched skill → inject function signature + example code into LLM system prompt
       │
       ▼
  LLM generates code / responds
  (prioritizes standardized writing in skill documents)

Retrieval points integrated into:

/api/chat/rag (Web knowledge Q&A)
seismo_code/code_engine.py (code generation engine)
seismo_agent/agent_loop.py (autonomous agent code generation at each step)

Built-in Skills

Built-in skills now use the OpenAI-style folder format: each skill directory contains SKILL.md and may include agents/, references/, assets/, workflows/, and subskills/.

Skill Directory	Main Use
`waveform_io/`	Waveform file reading, directory scanning, metadata extraction
`waveform_processing/`	Detrending, tapering, filtering, resampling, response removal
`waveform_visualization/`	Waveform plots, spectrograms, PSD, particle motion
`spectral_analysis/`	Spectrum, HVSR, spectral ratios, and frequency-domain analysis
`b_value_analysis/`	b-value, magnitude completeness, G-R relation, seismicity statistics
`source_parameters/`	Magnitude, corner frequency, seismic moment, moment magnitude, stress drop
`gmt_plotting/`	GMT maps, topography, epicenters, sections, and focal mechanisms
`terrain_3d_plotting/`	Python/Plotly/Three.js-style 3D terrain visualization
`pnsn_phase_detection/`	PhaseNet/EQTransformer phase picking and monitoring workflows
`tabular_io/`	CSV/Excel/text table reading and field inference
`cartopy_plotting/`	Cartopy map plotting fallback
`nature-figure/`, `nature-data/`, `nature-polishing/`	Academic figures, data organization, and paper polishing

Creating Custom Skills

Method 1: Web Interface (Recommended)

Visit /skills → Click "Create Custom Skill" → Fill in basic information → Complete documentation in editor.

Method 2: Command Line

python seismic_cli.py skill new my_hypodd_tool \
    --title "HypoDD Double Difference Location Tool" \
    --keywords "double difference location, HypoDD, precise location, relocation" \
    --desc "Package HypoDD input file generation and result parsing"

Method 3: Write an OpenAI-style skill folder directly

Create SKILL.md under seismo_skill/user_skills/<skill_name>/:

seismo_skill/user_skills/my_skill/
├── SKILL.md
├── subskills/
│   └── station_metadata.md
├── references/
│   └── example_catalog.md
└── agents/
    └── debug_notes.md

Recommended SKILL.md sections:

# Skill Title / 技能标题

## When to use / 何时使用

## Inputs and outputs / 输入与输出

## Workflow / 工作步骤

## Examples / 代码示例

Override Rules: When custom skill has same name as built-in skill, custom version takes priority automatically.

Building Documentation and Skills

SAGE can convert external documentation into OpenAI-style folder skills. The typical workflow is: put a documentation folder under seismo_skill/docs/, open the web Knowledge page, choose a skill structure, and let the Skill Builder create a reusable skill under seismo_skill/user_skills/.

Example: Build a GMT Documentation Skill

Download GMT Chinese documentation

Download a release archive from gmt-china/GMT_docs releases. Prefer a release archive that contains the built source files, examples, and documentation assets.

You can also download from the terminal, replacing <release-asset-url> with the asset URL from the release page:
```
cd /path/to/sage
mkdir -p seismo_skill/docs
curl -L "<release-asset-url>" -o /tmp/GMT_docs.zip
unzip /tmp/GMT_docs.zip -d seismo_skill/docs/
```
Make sure the final layout is a single documentation folder, for example:
```
seismo_skill/docs/GMT_docs-6.5/
  source/
  README.md
  ...
```
Start SAGE Web
```
python web_app/app.py --port 5010
```
Open http://localhost:5010/knowledge.
Generate the SKILL in the web UI

In the Skill Docs Directory card:
- Click refresh and select GMT_docs-6.5.
- Set SKILL structure to OpenAI-style folder SKILL.
- Enable RAG/vector-assisted build if the folder has many files. This uses retrieval and clustering only as build assistance; the final result is still a SKILL, not a permanent RAG index.
- Optionally set the target topic cluster count. Leave it blank to let SAGE suggest one automatically.
- Click Start Build.
Generated output

The generated skill is written to:
```
seismo_skill/user_skills/_gen_gmt_docs_zh/
  SKILL.md
  subskills/
  references/
  workflows/
  agents/
```
The top-level SKILL.md is the entry point. The subskills/ directory contains clustered reusable GMT subskills, and references/manifest.md records the source files used during construction.
Verify the generated SKILL is used

Ask in Chat or Code mode:
```
GMT 的 -J 投影选项怎么用？给我几个常见投影示例。
```
```
用 GMT grdimage 绘制地形图，并添加 colorbar，解释参数。
```
You do not need to explicitly mention _gen_gmt_docs_zh; GMT-related keywords such as GMT, grdimage, makecpt, coast, -J, -R, and -B should automatically retrieve the generated documentation skill together with the built-in gmt_plotting skill.
Manage or delete generated skills

Generated skills appear in the Knowledge/Skill management UI as skill assets and can be deleted from there. Deletion removes both the generated SKILL folder and its build metadata.

Supported documentation formats include PDF, Markdown (.md), reStructuredText (.rst), HTML, plain text, scripts, and mixed documentation folders.

Academic Research Skills

SAGE vendors academic-research-skills under:

third_party/academic-research-skills/

The bundled OpenAI-style skills include:

deep-research — literature-grounded research planning and evidence synthesis
academic-paper — academic manuscript drafting and revision
academic-paper-reviewer — reviewer-style critique and revision guidance
academic-pipeline — end-to-end research workflow planning

Install or refresh them from the web backend:

curl -X POST http://localhost:5010/api/skills/install-academic-research \
  -H "Content-Type: application/json" \
  -d '{"overwrite": true}'

After installation, Chat, Science Analysis, and CodeEngine can retrieve these skills automatically. You usually do not need to name a skill explicitly; research-oriented prompts such as "review this paper", "design a literature-backed study", or "write a JGR-style draft" should retrieve the relevant academic skills and combine them with local seismology skills and RAG.

Science Analysis and Parameter Optimization

The old geologic-interpretation page and /api/evidence_geo_agent* APIs have been removed. The legacy /evidence-geo-agent URL now redirects to the Science Analysis page:

http://localhost:5010/science-analysis-agent

For modular optimization workflows, use the Parameter Optimization Agent at:

http://localhost:5010/parameter-optimization-agent

The optimizer is meant for workflows where the user defines modules, module inputs/outputs, tunable parameters, and the final objective. The agent then asks CodeEngine to inspect the project folder, generate and debug scripts, run smoke tests, perform a bounded optimization or dry run, and save:

optimization_plan.md
best_parameters.json
optimization_history.csv
figures and logs
optimization_report.md

All outputs stay inside the selected project directory, so Science Analysis can later reuse the optimization trace, figures, and report when drafting a paper.

Building Progress Monitoring

The build process runs in the background and provides real-time progress updates:

Scanning Phase: Detects new/modified/deleted files
Indexing Phase: Processes documents and builds vector embeddings
Skill Generation Phase: Creates skill documents from indexed content
Completion: Updates knowledge base statistics

You can monitor progress through the web interface or check logs in the terminal.

seismo_script Workflow System

The workflow system lets you define multi-step analysis pipelines as declarative .md files. Each workflow specifies which skills to load, which steps to execute, and how those steps depend on each other. The Code Engine handles all code generation and execution — the workflow simply acts as the coordination blueprint.

Role Distribution

Role	Responsibility
Workflow	Process blueprint: what steps to run, which skills to use, in what order
Skill	Specialist manual: how to use a specific tool or method
Agent	Dispatcher: matches user request to workflow, loads skills, decomposes task
Code Engine	Programmer: generates and fixes Python/GMT/Shell code for each step
Tool	Executor: Python sandbox, GMT, Shell

Workflow File Format

Workflows use .md files with YAML frontmatter — the same format as skills:

---
name: seismicity_analysis
title: Seismicity Analysis Workflow
version: "1.0"
description: Complete seismicity analysis including catalog loading, spatial/temporal distribution, and b-value estimation
keywords:
  - seismicity
  - b-value
  - epicenter map
skills:
  - name: tabular_io
    role: catalog loading and parsing
  - name: gmt_plotting
    role: epicenter map rendering
  - name: b_value_analysis
    role: b-value estimation and GR plots
steps:
  - id: load_catalog
    skill: tabular_io
    description: Load earthquake catalog from file
  - id: epicenter_map
    skill: gmt_plotting
    description: Draw epicenter distribution map
    depends_on: [load_catalog]
  - id: b_value
    skill: b_value_analysis
    description: Calculate b-value and plot GR distribution
    depends_on: [load_catalog]
---

## Seismicity Analysis Workflow Guide

Step 1: Load the catalog using `load_catalog_file()`...

Frontmatter fields:

Field	Type	Description
`name`	str	Workflow identifier
`title`	str	Human-readable title
`description`	str	One-line summary
`keywords`	list[str]	Used for relevance search
`skills`	list[{name, role}]	Required skills and their roles in this workflow
`steps`	list[{id, skill, description, depends_on}]	Execution DAG

The Markdown body is the workflow guide — injected into the LLM context to direct code generation at each step.

Storage

Location	Contents
`seismo_script/workflows/`	Built-in workflows (shipped with SAGE)
`seismo_skill/skills/*/workflows/`	Built-in skill-local workflows, loaded with their owning skill package
`seismo_skill/user_skills/*/workflows/`	User/generated skill-local workflows, loaded with generated skills
Project-local `workflows/`	Web project workflows, recommended for reproducible studies and parameter optimization
`~/.seismicx/workflows/`	Legacy user workflows; new projects should prefer project-local storage

Built-in Workflows

Workflow	Description	Skills
`gmt_terrain_map`	Full GMT terrain map pipeline (7 steps: CPT → DEM cut → render → coast → contours → scale/legend → export)	`gmt_plotting`, `_gen_gmt_docs_6_5`
`seismicity_analysis`	Seismicity analysis (catalog → epicenter map → time series → b-value → cross-section)	`tabular_io`, `gmt_plotting`, `b_value_analysis`
`continuous_dataset_build`	Build a continuous waveform dataset from user data, station/catalog metadata, and labels	`continuous_seismic_monitoring`
`continuous_picker_benchmark`	Evaluate a user-supplied picker with phase-pick and event-level recall/statistics	`continuous_seismic_monitoring`
`continuous_waveform_detection`	Detect earthquakes from continuous waveform data using dataset/index artifacts	`continuous_seismic_monitoring`

`CodeEngine.run_workflow()` API

result: WorkflowRunResult = engine.run_workflow(
    workflow_name    = "seismicity_analysis",
    user_request     = "Analyze the 2024 catalog at /data/catalog.csv",
    data_hint        = "/data/catalog.csv",   # optional path hint injected into step prompts
    max_debug_rounds = 3,                     # retries per step on failure
    timeout          = 120,                   # per-step execution timeout (seconds)
    skip_on_failure  = False,                 # if True, skip failed steps instead of aborting
    on_progress      = callback_fn,           # optional: called with progress dicts
)

run_workflow() topo-sorts the step DAG, then for each step:

Checks all depends_on predecessors have succeeded
Scans the shared execution directory for available output files
Calls build_skill_context_with_rag() for the step's declared skill
Generates code via LLM (skill context + completed-steps summary injected)
Executes code in a shared directory (so step N+1 can read files written by step N)
On failure: re-queries RAG with the error text appended, retries up to max_debug_rounds
Records a StepResult and appends it to the shared conversation history

WorkflowRunResult:

@dataclass
class WorkflowRunResult:
    workflow_name: str
    steps:         List[StepResult]   # one entry per executed step
    shared_dir:    str                # directory where all step output files live
    total_time:    float              # total wall-clock time (seconds)

    @property
    def failed_steps(self)  -> List[StepResult]: ...
    @property
    def skipped_steps(self) -> List[StepResult]: ...

StepResult:

@dataclass
class StepResult:
    step_id:      str
    skill:        str
    description:  str
    success:      bool
    code:         str
    stdout:       str = ""
    stderr:       str = ""
    figures:      List[str] = field(default_factory=list)
    output_files: List[str] = field(default_factory=list)
    attempts:     int = 1
    diagnosis:    str = ""
    skipped:      bool = False

Web API

Trigger a workflow run:

POST /api/chat/workflow
Content-Type: application/json

{
  "workflow_name":   "seismicity_analysis",
  "message":         "Analyze the 2024 Sichuan catalog at /data/catalog.csv",
  "session_id":      "optional-session-id",
  "data_hint":       "/data/catalog.csv",
  "skip_on_failure": false
}

Response: { "ok": true, "job_id": "wf_xxxx" }

Poll for results (same endpoint as single-step code jobs):

GET /api/chat/code/poll/<job_id>

Response (completed):
{
  "status": "completed",
  "result": {
    "step_results": [
      { "step_id": "load_catalog", "success": true,  "figures": [...], "stdout": "..." },
      { "step_id": "epicenter_map","success": true,  "figures": ["/path/map.png"], "stdout": "" },
      { "step_id": "b_value",      "success": false, "diagnosis": "mc too high", "attempts": 3 }
    ],
    "shared_dir": "/tmp/sage_wf_xxxxx"
  }
}

Creating Custom Workflows

Method 1: Web Interface (Recommended)

Visit /skills → Workflows tab → Click "New Workflow" → Fill in metadata → Edit the Markdown guide body. The step DAG preview updates live as you edit the frontmatter.

Method 2: Write .md File Directly

Save to the current project's workflows/<name>.md using the frontmatter format shown above. Legacy ~/.seismicx/workflows/<name>.md files are still readable, but new projects should stay self-contained.

GMT Map Drawing

SAGE directly calls GMT6 through the run_gmt() utility function to generate professional-grade seismological maps.

Installing GMT

# macOS
brew install gmt

# Linux (Conda environment)
conda install -c conda-forge gmt

# Linux (apt)
sudo apt install gmt

Usage

Directly describe requirements in conversation, SAGE automatically generates and executes GMT script:

> Help me draw a Chinese topographic map with GMT
> Draw epicenter distribution map for 90-120°E, 20-45°N
> Draw station distribution map with GMT, data in /data/stations.txt

Or call in code (run_gmt is pre-injected, no import needed):

gmt_script = """
gmt begin china_topo PNG
  gmt grdcut @earth_relief_01m -R70/140/15/55 -Gtopo.grd
  gmt grdimage topo.grd -JM16c -Cetopo1 -I+d
  gmt coast -W0.5p,gray40 -N1/0.8p -Baf -BWSne+t"China Topographic Map"
  gmt colorbar -DJBC+w8c/0.4c -Baf+l"Elevation (m)"
gmt end
"""

run_gmt(gmt_script, outname="china_topo", title="China Topographic Map")

Automatic Chinese Title Processing

GMT's PostScript engine does not support CJK characters. SAGE automatically handles this issue:

Extract Chinese titles/labels from script before execution
Replace with empty placeholders, allowing GMT to render map content without garbled characters
After execution, overlay Chinese titles back onto PNG with matplotlib

User does not need to care about this detail, just write Chinese titles directly in the script.

Image and Script Download

Toolbar below each GMT image provides:

⬇ Image: Download PNG file
⬇ GMT Script: Download .sh script file, can independently run in terminal to completely reproduce the map

Core Modules Details

`seismo_script/` — Workflow System

seismo_script/
├── workflow_runner.py  # Workflow discovery, search, CRUD, and context building
├── workflows/          # Built-in workflow .md files (gmt_terrain_map, seismicity_analysis, ...)
└── __init__.py         # Public API: list_workflows, search_workflows, load_workflow,
                        #   save_user_workflow, delete_user_workflow, build_workflow_context

Public API summary:

Function	Description
`list_workflows()`	Return all workflow metadata (no guide body)
`search_workflows(query, top_k)`	Rank workflows by keyword relevance
`load_workflow(name)`	Return full workflow entry including guide text
`save_user_workflow(name, text)`	Save a `.md` workflow into the project directory; legacy user storage remains readable
`delete_user_workflow(name)`	Delete a user-defined workflow
`build_workflow_context(query)`	Return `(context_str, skill_names)` for LLM injection

`seismo_code/` — Code Generation and Execution Engine

seismo_code/
├── code_engine.py      # LLM code generation (skill injection, multi-round history, error retry,
│                       #   run_workflow() multi-step DAG execution)
├── safe_executor.py    # Sandbox execution (independent subprocess, 120s timeout, automatic image collection)
├── toolkit.py          # Built-in seismological utility functions (no import needed, direct call)
└── doc_parser.py       # Extract context snippets related to code tasks from PDF

Built-in Toolkit (toolkit.py, automatically injected during code execution):

Category	Functions
Data Reading	`read_stream`, `read_stream_from_dir`
Waveform Processing	`detrend_stream`, `taper_stream`, `filter_stream`, `resample_stream`, `trim_stream`, `remove_response`
Visualization	`plot_stream`, `plot_spectrogram`, `plot_psd`, `plot_particle_motion`, `plot_travel_time_curve`
Travel Time Calculation	`taup_arrivals`, `p_travel_time`, `s_travel_time`
Spectrum Analysis	`compute_spectrum`, `compute_hvsr`
Source Parameters	`estimate_magnitude_ml`, `estimate_corner_freq`, `estimate_seismic_moment`, `moment_to_mw`, `estimate_stress_drop`
GMT Plotting	`run_gmt`
Utility Functions	`stream_info`, `picks_to_dict`, `savefig`

Sandbox Execution Mechanism:

Code runs in independent subprocess, main process unaffected by crashes
Timeout protection (default 120 seconds)
Generated images automatically collected via [FIGURE] /path marker and sent to frontend
GMT scripts separately collected via [GMT_SCRIPT] /path marker, for frontend download provision

`seismo_agent/` — Autonomous Agent

Complete automatic implementation flow from literature to code:

seismo_agent/
├── paper_reader.py   # Literature loading (PDF / arXiv ID / DOI / plain text)
├── memory.py         # Cross-step work memory (literature content, step results, generated variables)
├── planner.py        # LLM task planning (goal + literature summary → JSON step list)
└── agent_loop.py     # Main loop (planning → code → execution → failure retry → summary)

Execution flow:

User goal + literature source (PDF / arXiv / DOI)
       │
  Load and extract core literature content
       │
  LLM plans execution steps (3–8 steps, JSON format)
       │
  ┌─── Each Step ───────────────────────────┐
  │  Retrieve relevant skill documents (seismo_skill)     │
  │  LLM generates code (skill context injection)      │  ← Retry up to 2 times on failure
  │  Sandbox secure execution                        │
  │  Record results and generated images                   │
  └──────────────────────────────────────┘
       │ Loop through all steps
  Summary report + output directory

`web_app/rag_engine.py` — Knowledge Base RAG Engine

Stage	Implementation
PDF Parsing	pdfminer.six (priority) / PyMuPDF (fallback)
Text Chunking	500 chars/chunk, 50 char sliding overlap
Vectorization	BGE-M3 (1024 dimensions, L2 normalized, Chinese-English bilingual)
Indexing	FAISS `IndexFlatIP` (inner product = cosine similarity)
Retrieval	Top-K recall + similarity threshold filtering, only showing truly matched literature
Persistence	`~/.seismicx/knowledge/`, automatic load on startup; automatic cleanup of orphaned vectors from deleted files on startup
Fallback	Automatic downgrade to TF-IDF cosine similarity retrieval when BGE-M3 unavailable

`seismo_stats/` — Seismic Statistical Analysis

seismo_stats/
├── bvalue.py         # Mc (maximum curvature / goodness-of-fit) + b-value (MLE / LSQ) + σ_b uncertainty
├── catalog_loader.py # Directory loading: CSV / JSON / picks.txt, automatic column name recognition
└── plotting.py       # F-M distribution plots, temporal activity plots, epicenter distribution plots

`seismo_tools/` — External Tool Registry

Unified management of third-party seismological tools such as HypoDD, VELEST, HASH. Supports automatic control file generation, calling external executables, parsing output results, and can be triggered via conversation commands.

Directory Structure

sage/
├── web_app/                      # Web service
│   ├── app.py                    # Flask main application (40+ API routes)
│   ├── rag_engine.py             # BGE-M3 + FAISS knowledge base engine
│   ├── simple_rag.py             # TF-IDF fallback RAG
│   ├── simple_vector_db.py       # Lightweight vector database (pickle persistence)
│   └── templates/
│       ├── chat.html             # Conversation page (main interface)
│       ├── knowledge.html        # Knowledge base management
│       ├── skills.html           # Skill management
│       └── llm_settings.html     # LLM configuration
│
├── seismo_skill/                 # Skill documentation system
│   ├── skill_loader.py           # Parse, retrieve, inject (Chinese-English mixed retrieval)
│   ├── __init__.py
│   ├── waveform_io.md            # Waveform reading
│   ├── waveform_processing.md    # Waveform preprocessing
│   ├── waveform_visualization.md # Waveform visualization
│   ├── spectral_analysis.md      # Spectrum analysis & HVSR
│   ├── b_value_analysis.md       # b-value statistical analysis
│   ├── source_parameters.md      # Source parameter estimation
│   ├── tabular_io.md             # CSV / TXT data reading
│   └── gmt_plotting.md           # GMT map drawing
│
├── seismo_script/                # Workflow system
│   ├── workflow_runner.py        # Workflow discovery, search, CRUD, context building
│   ├── workflows/                # Built-in workflow .md files
│   │   ├── gmt_terrain_map.md    # GMT terrain map 7-step pipeline
│   │   └── seismicity_analysis.md # Seismicity analysis pipeline
│   └── __init__.py
│
├── seismo_code/                  # Code generation and execution engine
│   ├── code_engine.py            # LLM code generation (multi-round history + error retry
│   │                             #   + engineering_plan.md + run_workflow() DAG execution)
│   ├── safe_executor.py          # Sandbox execution (subprocess + timeout protection)
│   ├── toolkit.py                # Built-in seismological utility functions
│   └── doc_parser.py             # PDF content extraction
│
├── seismo_agent/                 # Autonomous Agent
│   ├── agent_loop.py             # Main loop (SeismoAgent class)
│   ├── planner.py                # Task planning (TaskPlanner)
│   ├── memory.py                 # Work memory (AgentMemory)
│   └── paper_reader.py           # Literature loading (load_paper)
│
├── seismo_stats/                 # Seismic statistical analysis
│   ├── bvalue.py                 # b-value / Mc calculation
│   ├── catalog_loader.py         # Earthquake catalog loading
│   └── plotting.py               # Statistical chart plotting
│
├── seismo_tools/                 # External tool registry
│   └── tool_registry.py          # HypoDD / VELEST / HASH etc.
│
├── docs/
│   └── ARCHITECTURE.md           # Software structure and coding-agent design contract
│
├── seismo_skill/
│   ├── skills/                   # Built-in OpenAI-style skills
│   │   └── pnsn_phase_detection/
│   │       ├── SKILL.md
│   │       └── pnsn/             # Skill-local pnsn code and models
│   ├── user_skills/              # Web-generated/imported user skills
│   │   └── _gen_gmt_docs_zh/
│   │       ├── SKILL.md
│   │       └── subskills/
│   └── docs/                     # Documentation sources convertible to RAG/SKILL
│
├── conversational_agent.py       # Conversation Agent core (intent classification + skill execution)
├── config_manager.py             # LLM configuration management
├── backend_manager.py            # Multi-backend support (Ollama / vLLM / online API)
├── seismic_cli.py                # Command line entry point
├── requirements.txt              # Python dependencies
└── logo.png

.sage_runtime/                    # Local background PID, logs, env info, and plan cache (git ignored)
seismo_rag/                       # Project knowledge indexes and project_config.json

Configuration Files

Configuration is split into project-level and user-level files. The web Config page maintains them automatically, so manual editing is usually unnecessary.

{
  "llm": {
    "provider": "ollama",
    "model": "qwen3:30b",
    "api_base": "http://localhost:11434",
    "api_key": ""
  },
  "workspace": {
    "enabled": true,
    "path": "/data/seismic"
  }
}

Common locations:

seismo_rag/project_config.json: project-level settings such as search providers, coding backend preferences, and SKILL/RAG helper options.
.sage_runtime/: local PID files, logs, and transient runtime environment data; this directory is ignored by git.
Scientific Analysis / Parameter Optimization project folders: project inputs, outputs, figures, logs, Markdown/LaTeX drafts, and optimization traces.
~/.seismicx/config.json: a small set of user-level defaults, such as the default LLM backend.

Field	Description	Optional Values
`llm.provider`	LLM provider	`ollama` / `openai` / `custom`
`llm.model`	Model name	Ollama tag or API model name
`llm.api_base`	API endpoint address	`http://localhost:11434` (Ollama default)
`llm.api_key`	API key	Not required for Ollama
`workspace.enabled`	Whether to allow LLM to access local file lists	`true` / `false`
`workspace.path`	Authorized root directory (LLM cannot access content outside this path)	Absolute path string

FAQ

Q: Conversation returns "No available LLM model configured"

Go to /config to select an installed Ollama model, or configure an online API and click "Save Configuration".

Q: English questions like "what is filter algorithm?" are incorrectly routed to code execution

Fixed. SAGE uses LLM rather than keyword regex to determine intent, conceptual questions (containing technical terms like filter, spectrum) will be correctly routed to knowledge Q&A, not code execution.

Q: Knowledge base PDF vectorization is slow after upload

First run will download BGE-M3 model (~2 GB) from HuggingFace. Speed will be normal after completion. Domestic network can set mirror acceleration:

export HF_ENDPOINT=https://hf-mirror.com

If HuggingFace is completely inaccessible, install ModelScope (pip install modelscope) so SAGE can automatically download BAAI/bge-m3 from ModelScope into open_models/bge-m3, or download it manually (see Alternative: Download BGE-M3 via ModelScope).

Q: Chinese titles in GMT images show as garbled characters

No special handling required. SAGE has built-in CJK automatic processing: GMT execution stage replaces Chinese with empty placeholders, after execution matplotlib overlays Chinese titles back to PNG, ensuring correct Chinese display.

Q: GMT plotting fails, prompting "GMT not installed"

Install GMT >= 6.0:

# macOS
brew install gmt

# Linux (conda environment)
conda install -c conda-forge gmt

Q: Batch picking is slow

Default uses CPU. Add --device cuda to enable GPU acceleration (requires CUDA environment and corresponding PyTorch version).

Q: Agent step execution fails

Agent by default retries up to 2 times per step, failed steps will be skipped and subsequent steps continued. Can increase --max-steps limit, or check logs in output directory for details.

Q: How to make AI use my own function library?

Create an OpenAI-style SKILL.md under seismo_skill/user_skills/<skill_name>/, following Creating Custom Skills to describe when to use it, inputs/outputs, workflow steps, and minimal examples. Refresh the Skills page to manage it; Chat, CodeEngine, and Scientific Analysis Agent can retrieve and use it.

Q: RAG function reports error "embedding model library not found"

# 1. Confirm installation
pip list | grep -E "(FlagEmbedding|sentence-transformers)"

# 2. Try upgrade
pip install --upgrade FlagEmbedding sentence-transformers

# 3. If Rust compiler needed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
pip install FlagEmbedding sentence-transformers

If none of the above methods can solve, the project's built-in lightweight TF-IDF vector database will automatically serve as fallback solution, basic RAG functionality still available.

Acknowledgements

SAGE's integrated Aider backend builds on Aider, an open-source AI pair programming tool for terminal and Git workflows. SAGE vendors the Aider source under third_party/aider and integrates it through its Python scripting API where available, with installed-package and CLI fallbacks for compatibility. The experimental OpenHands backend is designed to interoperate with OpenHands. We thank these open-source communities for making stronger coding-agent workflows possible.

Contact

SeismicX is developed by:

Yuqi Cai - caiyuqiming@foxmail.com
Xin Liu - xinliu_geo@outlook.com
Ziye Yu - yuziye@cea-igp.ac.cn

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.

Q: How to add AI support for external tools like HypoDD?

Call register_tool() in seismo_tools/tool_registry.py to register tool parameter templates and calling commands; simultaneously create corresponding skill document in seismo_skill/, describing input file format, allowing AI to automatically reference during code generation.

_{Built with ❤️ for the seismology community}

Name		Name	Last commit message	Last commit date
Latest commit History 113 Commits
.vscode		.vscode
data/seismic/waveform		data/seismic/waveform
deliverables		deliverables
docs		docs
examples		examples
output		output
publish_mini		publish_mini
sage_agents		sage_agents
seismo_agent		seismo_agent
seismo_code		seismo_code
seismo_knowledge		seismo_knowledge
seismo_rag		seismo_rag
seismo_skill		seismo_skill
seismo_stats		seismo_stats
seismo_tools		seismo_tools
tests		tests
third_party		third_party
web_app		web_app
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
EXPLORATION_REPORT.md		EXPLORATION_REPORT.md
LICENSE		LICENSE
ONLINE_API_MODIFICATION_SUMMARY.md		ONLINE_API_MODIFICATION_SUMMARY.md
ONLINE_API_SETUP.md		ONLINE_API_SETUP.md
QUICK_START_ONLINE_API.md		QUICK_START_ONLINE_API.md
README.md		README.md
README.zh.md		README.zh.md
backend_manager.py		backend_manager.py
bvalue_result.txt		bvalue_result.txt
config_manager.py		config_manager.py
conversational_agent.py		conversational_agent.py
demo_online_api_setup.py		demo_online_api_setup.py
demo_waveform_visualization.py		demo_waveform_visualization.py
llm_agent.py		llm_agent.py
logo.png		logo.png
pytest.ini		pytest.ini
requirements.txt		requirements.txt
sage_paths.py		sage_paths.py
sagectl.sh		sagectl.sh
seismic_cli.py		seismic_cli.py
test_conversational_agent.py		test_conversational_agent.py
test_flag.py		test_flag.py
test_full_workflow.py		test_full_workflow.py
test_img.py		test_img.py
test_online_api_models.py		test_online_api_models.py
test_path_fix.py		test_path_fix.py
test_rnn.py		test_rnn.py
test_web_chat.py		test_web_chat.py

Folders and files

Latest commit

History

Repository files navigation

SAGE — Seismology AI-Guided Engine

Start Here: One-Command Launch With sagectl.sh

What sagectl.sh Does

Daily Commands

Useful Options

Table of Contents

Features Overview

System Architecture

Software Structure Documentation

Quick Start

Installation

System Requirements

Basic Installation

pnsn Phase Picking Module Location

RAG Dependencies

Alternative: Download BGE-M3 via ModelScope (recommended for users in China)

Configuring LLM Backend

Method 1: Ollama (Recommended, local, no internet required)

Method 2: Online API (OpenAI Compatible Format)

Method 3: Command Line Configuration

Web Interface

Chat (/chat)

Project Coding Workspaces

Built-in Coding Agent

Desktop GUI Control

Scientific Analysis (/science-analysis-agent)

Parameter Optimization (/parameter-optimization-agent)

Knowledge Base (/knowledge)

Skills (/skills)

SeismicX-Cont Continuous Monitoring Skill

Config (/config)

Command Line Tools (Advanced/Fallback)

Conversation Routing Mechanism

Routing Flow

Routing Types

seismo_skill Skill System

Working Principle

Built-in Skills

Creating Custom Skills

Building Documentation and Skills

Example: Build a GMT Documentation Skill

Academic Research Skills

Science Analysis and Parameter Optimization

Building Progress Monitoring

seismo_script Workflow System

Role Distribution

Workflow File Format

Storage

Built-in Workflows

CodeEngine.run_workflow() API

Web API

Creating Custom Workflows

GMT Map Drawing

Installing GMT

Usage

Automatic Chinese Title Processing

Image and Script Download

Core Modules Details

seismo_script/ — Workflow System

seismo_code/ — Code Generation and Execution Engine

seismo_agent/ — Autonomous Agent

web_app/rag_engine.py — Knowledge Base RAG Engine

seismo_stats/ — Seismic Statistical Analysis

seismo_tools/ — External Tool Registry

Directory Structure

Configuration Files

FAQ

Acknowledgements

Contact

License

About

Resources

License

Uh oh!

Stars

Watchers

Start Here: One-Command Launch With `sagectl.sh`

What `sagectl.sh` Does

`CodeEngine.run_workflow()` API

`seismo_script/` — Workflow System

`seismo_code/` — Code Generation and Execution Engine

`seismo_agent/` — Autonomous Agent

`web_app/rag_engine.py` — Knowledge Base RAG Engine

`seismo_stats/` — Seismic Statistical Analysis

`seismo_tools/` — External Tool Registry

Packages