Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion docs/local-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ sh -n scripts/install-amesh-node.sh
- The published remote bootstrap path is `curl .../install-amesh-node.sh | ... bash`, so the installer must keep working when Bash reads it from stdin instead of from a file.
- The installer now logs whether it is reusing or creating config/state, and on systemd hosts it fails the install if the user service does not remain active after startup. When that happens it prints both `systemctl --user status` and recent `journalctl --user -u amesh-node` output.
- `install-amesh-node.sh` also normalizes `~/.acpx/config.json` so ACPX non-interactive health probes start from a valid baseline on first install.
- Detected agents now persist the registering shell's `PATH` into node config. This avoids later service-only regressions where a systemd user unit resolves a different `node` binary than the interactive shell that successfully ran the same agent CLI.
- Detected agents now persist the registering shell's `PATH` into node config and prepend the resolved executable directories for the detected agent CLI and `node`. This avoids later service-only regressions where a systemd user unit or an `fnm` multishell shim resolves a different or stale Node runtime than the interactive shell that successfully ran the same agent CLI.
- ACP aliases for external clients can be served locally with `go run ./cmd/amesh acp <alias>`. The default alias registry is `~/.config/amesh/acp.json`:

```json
Expand Down Expand Up @@ -133,3 +133,4 @@ amesh-node update

Authenticated admins can also trigger the same node-side updater from the dashboard. The control plane sends a `node.update` command over the existing node websocket, the daemon runs `amesh-node update`, and a managed systemd service should restart back into the new binary after the process exits.
- The dashboard only shows the update action when the node reports an installed release tag and that tag differs from the control plane's latest known GitHub release tag.
- Daemon-triggered self-updates reuse the node's active `server`, `config`, and `state` paths and deliberately avoid `systemctl stop` during the update run. The daemon exits after the installer finishes and systemd restarts it into the new binary.
6 changes: 6 additions & 0 deletions docs/past-failures.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,12 @@
- Consequence: a fresh node could advertise agents, yet the dashboard showed runtime errors like `/usr/bin/env: 'node': No such file or directory` or `toSorted is not a function` once the daemon tried to execute them.
- Mitigation: detected agent configs now persist the working shell `PATH`, and the installer now fails fast unless `node` `22.x+` is available before it installs the daemon service. Covered by a Go detection test that asserts the saved agent env includes the original `PATH`.

## 2026-05-15: `fnm` multishell shims made saved agent PATH entries go stale

- Symptom: daemon-side health probes failed with `/usr/bin/env: 'node': No such file or directory` even though detection succeeded in an `fnm` shell and the saved config already included `PATH`.
- Cause: detection persisted the shell's raw PATH order. In `fnm` environments that can put transient multishell shim directories ahead of the stable Node installation path, so later daemon runs reused a dead shim directory.
- Mitigation: detected agent env now prepends the resolved executable directories for both the agent CLI and `node`, then appends the original shell `PATH` as fallback. Covered by a Go regression test that simulates `fnm`-style symlink shims.

## 2026-05-11: Node inventory had no lightweight way to express multiple working directories

- The node config only described base agents, so a single machine could not advertise the same local agent across multiple useful workspaces without hand-editing duplicate agent entries.
Expand Down
28 changes: 17 additions & 11 deletions install-amesh-node.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ SERVICE_PATH="${SERVICE_PATH:-$HOME/.config/systemd/user/${SERVICE_NAME}.service
SERVER_URL="${SERVER_URL:-}"
REGISTRATION_TOKEN="${REGISTRATION_TOKEN:-}"
NODE_ID="${NODE_ID:-$(hostname)-amesh}"
SELF_UPDATE="${AMESH_NODE_SELF_UPDATE:-0}"

log() {
printf '%s\n' "$*" >&2
Expand Down Expand Up @@ -194,7 +195,7 @@ main() {
need_cmd install
need_cmd mkdir

if [[ -z "$SERVER_URL" ]]; then
if [[ -z "$SERVER_URL" && ! -f "$STATE_PATH" ]]; then
fail "SERVER_URL is required"
fi

Expand Down Expand Up @@ -246,7 +247,7 @@ main() {
install -m 0755 "${extract_dir}/${cli_binary_name}" "${cli_binary_path}"
fi

if command -v systemctl >/dev/null 2>&1; then
if command -v systemctl >/dev/null 2>&1 && [[ "$SELF_UPDATE" != "1" ]]; then
systemctl --user stop "$SERVICE_NAME" >/dev/null 2>&1 || true
fi

Expand Down Expand Up @@ -309,16 +310,21 @@ EOF

if command -v systemctl >/dev/null 2>&1; then
systemctl --user daemon-reload
systemctl --user enable --now "$SERVICE_NAME"
sleep 2
if ! systemctl --user --quiet is-active "$SERVICE_NAME"; then
log "service failed to stay active: $SERVICE_NAME"
systemctl --user --no-pager --full status "$SERVICE_NAME" >&2 || true
journalctl --user -u "$SERVICE_NAME" -n 80 --no-pager >&2 || true
fail "amesh-node user service did not reach active state"
if [[ "$SELF_UPDATE" == "1" ]]; then
systemctl --user enable "$SERVICE_NAME"
log "prepared user service restart after self-update: $SERVICE_NAME"
else
systemctl --user enable --now "$SERVICE_NAME"
sleep 2
if ! systemctl --user --quiet is-active "$SERVICE_NAME"; then
log "service failed to stay active: $SERVICE_NAME"
systemctl --user --no-pager --full status "$SERVICE_NAME" >&2 || true
journalctl --user -u "$SERVICE_NAME" -n 80 --no-pager >&2 || true
fail "amesh-node user service did not reach active state"
fi
log "installed and started user service: $SERVICE_NAME"
log "service logs: journalctl --user -u ${SERVICE_NAME} -f"
fi
log "installed and started user service: $SERVICE_NAME"
log "service logs: journalctl --user -u ${SERVICE_NAME} -f"
else
log "systemctl not found; service file written to $SERVICE_PATH"
log "start manually: AMESH_ACPX_PATH='${ACPX_BIN}' '${binary_path}' run --state '${STATE_PATH}'"
Expand Down
96 changes: 79 additions & 17 deletions internal/app/app.go
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,15 @@ type sleeper func(ctx context.Context, delay time.Duration) error

type capabilityProber func(ctx context.Context, agent nodeconfig.AgentConfig) error

type updateRunner func(ctx context.Context, stdout, stderr io.Writer) error
type nodeUpdateOptions struct {
ServerURL string
NodeID string
ConfigPath string
StatePath string
SelfUpdate bool
}

type updateRunner func(ctx context.Context, stdout, stderr io.Writer, options nodeUpdateOptions) error
type detectRunner func(ctx context.Context, configPath string) error

type retryableDaemonError struct {
Expand Down Expand Up @@ -99,7 +107,7 @@ func run(ctx context.Context, args []string, update updateRunner, detect detectR
case "detect":
return runDetectCommand(ctx, args[1:], detect)
case "update":
return update(ctx, os.Stdout, os.Stderr)
return update(ctx, os.Stdout, os.Stderr, nodeUpdateOptions{})
case "acp":
return runACPBridge(ctx, args[1:], os.Stdin, os.Stdout)
case "logs":
Expand Down Expand Up @@ -159,7 +167,7 @@ func runACPBridge(ctx context.Context, args []string, stdin io.Reader, stdout io
return bridge.Serve(ctx, stdin, stdout)
}

func runUpdate(ctx context.Context, stdout, stderr io.Writer) error {
func runUpdate(ctx context.Context, stdout, stderr io.Writer, options nodeUpdateOptions) error {
if _, err := exec.LookPath("bash"); err != nil {
return errors.New("required CLI missing: bash")
}
Expand All @@ -180,6 +188,21 @@ func runUpdate(ctx context.Context, stdout, stderr io.Writer) error {
cmd.Stdout = stdout
cmd.Stderr = stderr
cmd.Env = append(os.Environ(), "AMESH_INSTALL_URL="+installerURL)
if strings.TrimSpace(options.ServerURL) != "" && os.Getenv("SERVER_URL") == "" {
cmd.Env = append(cmd.Env, "SERVER_URL="+options.ServerURL)
}
if strings.TrimSpace(options.NodeID) != "" && os.Getenv("NODE_ID") == "" {
cmd.Env = append(cmd.Env, "NODE_ID="+options.NodeID)
}
if strings.TrimSpace(options.ConfigPath) != "" && os.Getenv("CONFIG_PATH") == "" {
cmd.Env = append(cmd.Env, "CONFIG_PATH="+options.ConfigPath)
}
if strings.TrimSpace(options.StatePath) != "" && os.Getenv("STATE_PATH") == "" {
cmd.Env = append(cmd.Env, "STATE_PATH="+options.StatePath)
}
if options.SelfUpdate {
cmd.Env = append(cmd.Env, "AMESH_NODE_SELF_UPDATE=1")
}
if os.Getenv("INSTALL_DIR") == "" {
if installDir, ok := currentInstallDir(); ok {
cmd.Env = append(cmd.Env, "INSTALL_DIR="+installDir)
Expand Down Expand Up @@ -326,7 +349,7 @@ func verifiedOpenClawEnv(ctx context.Context, runner acpx.Runner, fallback map[s
}

baseEntries := filepath.SplitList(os.Getenv("PATH"))
nodeDirs := lookPathDir("node")
nodeDirs := commandPathDirs("node")
for _, dir := range candidateDirs {
pathEntries := uniquePathEntries([]string{dir}, nodeDirs, baseEntries)
env := map[string]string{
Expand Down Expand Up @@ -365,21 +388,22 @@ func openClawPathDirs() []string {
if err != nil || info.IsDir() || info.Mode()&0o111 == 0 {
continue
}
clean := filepath.Clean(dir)
if _, ok := seen[clean]; ok {
continue
for _, candidateDir := range executableDirs(path) {
if _, ok := seen[candidateDir]; ok {
continue
}
seen[candidateDir] = struct{}{}
dirs = append(dirs, candidateDir)
}
seen[clean] = struct{}{}
dirs = append(dirs, clean)
}
return dirs
}

func detectedAgentEnv(candidate detectableAgent) map[string]string {
pathEntries := uniquePathEntries(
commandPathDirs(candidate.ACPXAgent),
commandPathDirs("node"),
filepath.SplitList(os.Getenv("PATH")),
lookPathDir(candidate.ACPXAgent),
lookPathDir("node"),
)
if len(pathEntries) == 0 {
return map[string]string{}
Expand All @@ -389,19 +413,43 @@ func detectedAgentEnv(candidate detectableAgent) map[string]string {
}
}

func lookPathDir(command string) []string {
func commandPathDirs(command string) []string {
if strings.TrimSpace(command) == "" {
return nil
}
path, err := exec.LookPath(command)
if err != nil {
return nil
}
dir := strings.TrimSpace(filepath.Dir(path))
if dir == "" {
return executableDirs(path)
}

func executableDirs(path string) []string {
path = strings.TrimSpace(path)
if path == "" {
return nil
}
return []string{dir}

dirs := make([]string, 0, 2)
add := func(dir string) {
dir = strings.TrimSpace(dir)
if dir == "" {
return
}
dir = filepath.Clean(dir)
for _, existing := range dirs {
if existing == dir {
return
}
}
dirs = append(dirs, dir)
}

if resolved, err := filepath.EvalSymlinks(path); err == nil {
add(filepath.Dir(resolved))
}
add(filepath.Dir(path))
return dirs
}

func uniquePathEntries(groups ...[]string) []string {
Expand Down Expand Up @@ -655,6 +703,7 @@ func runDaemon(ctx context.Context, args []string, update updateRunner, detect d
*nodeID,
*reconnectToken,
*configPath,
*statePath,
runner,
sessions,
func(serverURL string) daemonClient {
Expand Down Expand Up @@ -929,6 +978,7 @@ func runDaemonLoop(
nodeID string,
reconnectToken string,
configPath string,
statePath string,
runner acpx.Runner,
sessions *sessionStore,
clientFactory daemonClientFactory,
Expand All @@ -947,6 +997,7 @@ func runDaemonLoop(
nodeID,
reconnectToken,
configPath,
statePath,
runner,
sessions,
clientFactory,
Expand Down Expand Up @@ -982,6 +1033,7 @@ func runDaemonSession(
nodeID string,
reconnectToken string,
configPath string,
statePath string,
runner acpx.Runner,
sessions *sessionStore,
clientFactory daemonClientFactory,
Expand Down Expand Up @@ -1135,8 +1187,18 @@ func runDaemonSession(
}
case "node.update":
logf("update command node=%s", nodeID)
sendNodeLog(sessionCtx, client, nodeID, "warn", "node update requested", nil)
if err := update(sessionCtx, os.Stdout, os.Stderr); err != nil {
sendNodeLog(sessionCtx, client, nodeID, "warn", "node update requested", map[string]any{
"serverUrl": serverURL,
"config": configPath,
"state": statePath,
})
if err := update(sessionCtx, os.Stdout, os.Stderr, nodeUpdateOptions{
ServerURL: serverURL,
NodeID: nodeID,
ConfigPath: configPath,
StatePath: statePath,
SelfUpdate: true,
}); err != nil {
sendNodeLog(sessionCtx, client, nodeID, "error", "node update failed", map[string]any{
"error": err.Error(),
})
Expand Down
Loading
Loading