Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion docs/local-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ sh -n scripts/install-amesh-node.sh
- The published remote bootstrap path is `curl .../install-amesh-node.sh | ... bash`, so the installer must keep working when Bash reads it from stdin instead of from a file.
- The installer now logs whether it is reusing or creating config/state, and on systemd hosts it fails the install if the user service does not remain active after startup. When that happens it prints both `systemctl --user status` and recent `journalctl --user -u amesh-node` output.
- `install-amesh-node.sh` also normalizes `~/.acpx/config.json` so ACPX non-interactive health probes start from a valid baseline on first install.
- Detected agents now persist the registering shell's `PATH` into node config. This avoids later service-only regressions where a systemd user unit resolves a different `node` binary than the interactive shell that successfully ran the same agent CLI.
- Detected agents now persist the registering shell's `PATH` into node config and prepend the resolved executable directories for the detected agent CLI and `node`. This avoids later service-only regressions where a systemd user unit or an `fnm` multishell shim resolves a different or stale Node runtime than the interactive shell that successfully ran the same agent CLI.
- ACP aliases for external clients can be served locally with `go run ./cmd/amesh acp <alias>`. The default alias registry is `~/.config/amesh/acp.json`:

```json
Expand Down Expand Up @@ -133,3 +133,17 @@ amesh-node update

Authenticated admins can also trigger the same node-side updater from the dashboard. The control plane sends a `node.update` command over the existing node websocket, the daemon runs `amesh-node update`, and a managed systemd service should restart back into the new binary after the process exits.
- The dashboard only shows the update action when the node reports an installed release tag and that tag differs from the control plane's latest known GitHub release tag.
- Daemon-triggered self-updates reuse the node's active `server`, `config`, and `state` paths and deliberately avoid `systemctl stop` during the update run. The daemon exits after the installer finishes and systemd restarts it into the new binary.

## Remote reinstall

```bash
amesh-node reinstall
```

The shared CLI also exposes the same command as `amesh reinstall`.

`reinstall` is the destructive recovery path for a stale or suspect node install. It stops and disables the managed user service, removes the node service file, durable node state, detected agent config, installed `amesh-node` and `amesh` binaries, and the managed `~/.local/share/amesh` payload, then runs the installer again from scratch.
- Use `reinstall` when you suspect stale node state, stale detected agent inventory, or broken managed ACPX/node wiring.
- `reinstall` preserves the user ACPX config at `~/.acpx/config.json`; it only wipes amesh-managed node artifacts.
- On success, the installer re-detects agents, re-registers the node, rewrites the service, and starts the managed daemon again.
6 changes: 6 additions & 0 deletions docs/past-failures.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,12 @@
- Consequence: a fresh node could advertise agents, yet the dashboard showed runtime errors like `/usr/bin/env: 'node': No such file or directory` or `toSorted is not a function` once the daemon tried to execute them.
- Mitigation: detected agent configs now persist the working shell `PATH`, and the installer now fails fast unless `node` `22.x+` is available before it installs the daemon service. Covered by a Go detection test that asserts the saved agent env includes the original `PATH`.

## 2026-05-15: `fnm` multishell shims made saved agent PATH entries go stale

- Symptom: daemon-side health probes failed with `/usr/bin/env: 'node': No such file or directory` even though detection succeeded in an `fnm` shell and the saved config already included `PATH`.
- Cause: detection persisted the shell's raw PATH order. In `fnm` environments that can put transient multishell shim directories ahead of the stable Node installation path, so later daemon runs reused a dead shim directory.
- Mitigation: detected agent env now prepends the resolved executable directories for both the agent CLI and `node`, then appends the original shell `PATH` as fallback. Covered by a Go regression test that simulates `fnm`-style symlink shims.

## 2026-05-11: Node inventory had no lightweight way to express multiple working directories

- The node config only described base agents, so a single machine could not advertise the same local agent across multiple useful workspaces without hand-editing duplicate agent entries.
Expand Down
2 changes: 2 additions & 0 deletions docs/testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,3 +38,5 @@
- The web app also covers the top-bar MCP config panel so the copy-paste client snippets stay aligned with the server endpoint and scope headers.
- The Go daemon owns table-driven tests for config loading, reconnect logic, update, detect, exposed-path command dispatch, and `acpx` process lifecycle including streamed output and cancellation.
- The dev helper script also has a regression shell test for the stale local reconnect-token path, so local `pnpm dev:daemon` re-registers automatically after a fresh control-plane reset.
- The Go daemon also covers the shared `reinstall` subcommand and verifies that reinstall mode passes the destructive reset flag through to the installer.
- `scripts/test-install-amesh-node.sh` also covers remote self-update and full reinstall flows, including reinstall-time cleanup of stale node state, config, service, binaries, and managed amesh home.
39 changes: 28 additions & 11 deletions install-amesh-node.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ SERVICE_PATH="${SERVICE_PATH:-$HOME/.config/systemd/user/${SERVICE_NAME}.service
SERVER_URL="${SERVER_URL:-}"
REGISTRATION_TOKEN="${REGISTRATION_TOKEN:-}"
NODE_ID="${NODE_ID:-$(hostname)-amesh}"
SELF_UPDATE="${AMESH_NODE_SELF_UPDATE:-0}"
REINSTALL="${AMESH_NODE_REINSTALL:-0}"

log() {
printf '%s\n' "$*" >&2
Expand Down Expand Up @@ -194,7 +196,7 @@ main() {
need_cmd install
need_cmd mkdir

if [[ -z "$SERVER_URL" ]]; then
if [[ -z "$SERVER_URL" && ! -f "$STATE_PATH" ]]; then
fail "SERVER_URL is required"
fi

Expand All @@ -219,6 +221,16 @@ main() {
tmp_dir="$(mktemp -d)"
trap 'rm -rf "${tmp_dir}"' EXIT

if [[ "$REINSTALL" == "1" ]]; then
log "reinstall requested; removing existing node install artifacts"
if command -v systemctl >/dev/null 2>&1; then
systemctl --user stop "$SERVICE_NAME" >/dev/null 2>&1 || true
systemctl --user disable "$SERVICE_NAME" >/dev/null 2>&1 || true
fi
rm -f "$SERVICE_PATH" "$STATE_PATH" "$CONFIG_PATH" "$binary_path" "$cli_binary_path"
rm -rf "$AMESH_HOME"
fi

mkdir -p "${install_dir}"
mkdir -p "${AMESH_HOME}"
mkdir -p "$(dirname "$STATE_PATH")"
Expand Down Expand Up @@ -246,7 +258,7 @@ main() {
install -m 0755 "${extract_dir}/${cli_binary_name}" "${cli_binary_path}"
fi

if command -v systemctl >/dev/null 2>&1; then
if command -v systemctl >/dev/null 2>&1 && [[ "$SELF_UPDATE" != "1" ]]; then
systemctl --user stop "$SERVICE_NAME" >/dev/null 2>&1 || true
fi

Expand Down Expand Up @@ -309,16 +321,21 @@ EOF

if command -v systemctl >/dev/null 2>&1; then
systemctl --user daemon-reload
systemctl --user enable --now "$SERVICE_NAME"
sleep 2
if ! systemctl --user --quiet is-active "$SERVICE_NAME"; then
log "service failed to stay active: $SERVICE_NAME"
systemctl --user --no-pager --full status "$SERVICE_NAME" >&2 || true
journalctl --user -u "$SERVICE_NAME" -n 80 --no-pager >&2 || true
fail "amesh-node user service did not reach active state"
if [[ "$SELF_UPDATE" == "1" ]]; then
systemctl --user enable "$SERVICE_NAME"
log "prepared user service restart after self-update: $SERVICE_NAME"
else
systemctl --user enable --now "$SERVICE_NAME"
sleep 2
if ! systemctl --user --quiet is-active "$SERVICE_NAME"; then
log "service failed to stay active: $SERVICE_NAME"
systemctl --user --no-pager --full status "$SERVICE_NAME" >&2 || true
journalctl --user -u "$SERVICE_NAME" -n 80 --no-pager >&2 || true
fail "amesh-node user service did not reach active state"
fi
log "installed and started user service: $SERVICE_NAME"
log "service logs: journalctl --user -u ${SERVICE_NAME} -f"
fi
log "installed and started user service: $SERVICE_NAME"
log "service logs: journalctl --user -u ${SERVICE_NAME} -f"
else
log "systemctl not found; service file written to $SERVICE_PATH"
log "start manually: AMESH_ACPX_PATH='${ACPX_BIN}' '${binary_path}' run --state '${STATE_PATH}'"
Expand Down
122 changes: 103 additions & 19 deletions internal/app/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,16 @@ type sleeper func(ctx context.Context, delay time.Duration) error

type capabilityProber func(ctx context.Context, agent nodeconfig.AgentConfig) error

type updateRunner func(ctx context.Context, stdout, stderr io.Writer) error
type nodeUpdateOptions struct {
ServerURL string
NodeID string
ConfigPath string
StatePath string
SelfUpdate bool
Reinstall bool
}

type updateRunner func(ctx context.Context, stdout, stderr io.Writer, options nodeUpdateOptions) error
type detectRunner func(ctx context.Context, configPath string) error

type retryableDaemonError struct {
Expand Down Expand Up @@ -88,7 +97,7 @@ func Run(ctx context.Context, args []string) error {

func run(ctx context.Context, args []string, update updateRunner, detect detectRunner) error {
if len(args) == 0 {
return errors.New("expected subcommand: register, run, detect, update, or acp")
return errors.New("expected subcommand: register, run, detect, update, reinstall, logs, or acp")
}

switch args[0] {
Expand All @@ -99,7 +108,9 @@ func run(ctx context.Context, args []string, update updateRunner, detect detectR
case "detect":
return runDetectCommand(ctx, args[1:], detect)
case "update":
return update(ctx, os.Stdout, os.Stderr)
return update(ctx, os.Stdout, os.Stderr, nodeUpdateOptions{})
case "reinstall":
return update(ctx, os.Stdout, os.Stderr, nodeUpdateOptions{Reinstall: true})
case "acp":
return runACPBridge(ctx, args[1:], os.Stdin, os.Stdout)
case "logs":
Expand Down Expand Up @@ -159,7 +170,16 @@ func runACPBridge(ctx context.Context, args []string, stdin io.Reader, stdout io
return bridge.Serve(ctx, stdin, stdout)
}

func runUpdate(ctx context.Context, stdout, stderr io.Writer) error {
func runUpdate(ctx context.Context, stdout, stderr io.Writer, options nodeUpdateOptions) error {
return runInstaller(ctx, stdout, stderr, options, options.Reinstall)
}

func runReinstall(ctx context.Context, stdout, stderr io.Writer, options nodeUpdateOptions) error {
options.Reinstall = true
return runInstaller(ctx, stdout, stderr, options, true)
}

func runInstaller(ctx context.Context, stdout, stderr io.Writer, options nodeUpdateOptions, reinstall bool) error {
if _, err := exec.LookPath("bash"); err != nil {
return errors.New("required CLI missing: bash")
}
Expand All @@ -180,14 +200,39 @@ func runUpdate(ctx context.Context, stdout, stderr io.Writer) error {
cmd.Stdout = stdout
cmd.Stderr = stderr
cmd.Env = append(os.Environ(), "AMESH_INSTALL_URL="+installerURL)
if strings.TrimSpace(options.ServerURL) != "" && os.Getenv("SERVER_URL") == "" {
cmd.Env = append(cmd.Env, "SERVER_URL="+options.ServerURL)
}
if strings.TrimSpace(options.NodeID) != "" && os.Getenv("NODE_ID") == "" {
cmd.Env = append(cmd.Env, "NODE_ID="+options.NodeID)
}
if strings.TrimSpace(options.ConfigPath) != "" && os.Getenv("CONFIG_PATH") == "" {
cmd.Env = append(cmd.Env, "CONFIG_PATH="+options.ConfigPath)
}
if strings.TrimSpace(options.StatePath) != "" && os.Getenv("STATE_PATH") == "" {
cmd.Env = append(cmd.Env, "STATE_PATH="+options.StatePath)
}
if options.SelfUpdate {
cmd.Env = append(cmd.Env, "AMESH_NODE_SELF_UPDATE=1")
}
if reinstall {
cmd.Env = append(cmd.Env, "AMESH_NODE_REINSTALL=1")
}
if os.Getenv("INSTALL_DIR") == "" {
if installDir, ok := currentInstallDir(); ok {
cmd.Env = append(cmd.Env, "INSTALL_DIR="+installDir)
}
}

fmt.Fprintf(stdout, "updating amesh-node from %s\n", installerURL)
action := "updating"
if reinstall {
action = "reinstalling"
}
fmt.Fprintf(stdout, "%s amesh-node from %s\n", action, installerURL)
if err := cmd.Run(); err != nil {
if reinstall {
return fmt.Errorf("reinstall failed: %w", err)
}
return fmt.Errorf("update failed: %w", err)
}
return nil
Expand Down Expand Up @@ -326,7 +371,7 @@ func verifiedOpenClawEnv(ctx context.Context, runner acpx.Runner, fallback map[s
}

baseEntries := filepath.SplitList(os.Getenv("PATH"))
nodeDirs := lookPathDir("node")
nodeDirs := commandPathDirs("node")
for _, dir := range candidateDirs {
pathEntries := uniquePathEntries([]string{dir}, nodeDirs, baseEntries)
env := map[string]string{
Expand Down Expand Up @@ -365,21 +410,22 @@ func openClawPathDirs() []string {
if err != nil || info.IsDir() || info.Mode()&0o111 == 0 {
continue
}
clean := filepath.Clean(dir)
if _, ok := seen[clean]; ok {
continue
for _, candidateDir := range executableDirs(path) {
if _, ok := seen[candidateDir]; ok {
continue
}
seen[candidateDir] = struct{}{}
dirs = append(dirs, candidateDir)
}
seen[clean] = struct{}{}
dirs = append(dirs, clean)
}
return dirs
}

func detectedAgentEnv(candidate detectableAgent) map[string]string {
pathEntries := uniquePathEntries(
commandPathDirs(candidate.ACPXAgent),
commandPathDirs("node"),
filepath.SplitList(os.Getenv("PATH")),
lookPathDir(candidate.ACPXAgent),
lookPathDir("node"),
)
if len(pathEntries) == 0 {
return map[string]string{}
Expand All @@ -389,19 +435,43 @@ func detectedAgentEnv(candidate detectableAgent) map[string]string {
}
}

func lookPathDir(command string) []string {
func commandPathDirs(command string) []string {
if strings.TrimSpace(command) == "" {
return nil
}
path, err := exec.LookPath(command)
if err != nil {
return nil
}
dir := strings.TrimSpace(filepath.Dir(path))
if dir == "" {
return executableDirs(path)
}

func executableDirs(path string) []string {
path = strings.TrimSpace(path)
if path == "" {
return nil
}
return []string{dir}

dirs := make([]string, 0, 2)
add := func(dir string) {
dir = strings.TrimSpace(dir)
if dir == "" {
return
}
dir = filepath.Clean(dir)
for _, existing := range dirs {
if existing == dir {
return
}
}
dirs = append(dirs, dir)
}

if resolved, err := filepath.EvalSymlinks(path); err == nil {
add(filepath.Dir(resolved))
}
add(filepath.Dir(path))
return dirs
}

func uniquePathEntries(groups ...[]string) []string {
Expand Down Expand Up @@ -655,6 +725,7 @@ func runDaemon(ctx context.Context, args []string, update updateRunner, detect d
*nodeID,
*reconnectToken,
*configPath,
*statePath,
runner,
sessions,
func(serverURL string) daemonClient {
Expand Down Expand Up @@ -929,6 +1000,7 @@ func runDaemonLoop(
nodeID string,
reconnectToken string,
configPath string,
statePath string,
runner acpx.Runner,
sessions *sessionStore,
clientFactory daemonClientFactory,
Expand All @@ -947,6 +1019,7 @@ func runDaemonLoop(
nodeID,
reconnectToken,
configPath,
statePath,
runner,
sessions,
clientFactory,
Expand Down Expand Up @@ -982,6 +1055,7 @@ func runDaemonSession(
nodeID string,
reconnectToken string,
configPath string,
statePath string,
runner acpx.Runner,
sessions *sessionStore,
clientFactory daemonClientFactory,
Expand Down Expand Up @@ -1135,8 +1209,18 @@ func runDaemonSession(
}
case "node.update":
logf("update command node=%s", nodeID)
sendNodeLog(sessionCtx, client, nodeID, "warn", "node update requested", nil)
if err := update(sessionCtx, os.Stdout, os.Stderr); err != nil {
sendNodeLog(sessionCtx, client, nodeID, "warn", "node update requested", map[string]any{
"serverUrl": serverURL,
"config": configPath,
"state": statePath,
})
if err := update(sessionCtx, os.Stdout, os.Stderr, nodeUpdateOptions{
ServerURL: serverURL,
NodeID: nodeID,
ConfigPath: configPath,
StatePath: statePath,
SelfUpdate: true,
}); err != nil {
sendNodeLog(sessionCtx, client, nodeID, "error", "node update failed", map[string]any{
"error": err.Error(),
})
Expand Down
Loading
Loading