Skip to content

Native sh workload keeps running after workload stop (quickstart example) #520

@hjoukl

Description

@hjoukl

Hi,

very cool NATS functionality indeed.

In the quickstart example it seems I can't successfully stop the "native" sh workload once I've started it, though.
While the workload stop command responds with "workload succesfully stopped" it just keeps on running.

Here goes:

Start nex node on server1:

0 user@server1 .../~ $ nex -s nats://127.0.0.1:4222 node -l debug up
2026-05-22 17:02:32 [INFO] Starting nex node version=0.4.1 commit=8930ba7c695bd209014d2a30bd4889ae67c5049d build_date=2026-03-16T14:23:28Z node_id=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS name=nex-node-988 nexus=nexus nats_server=nats://127.0.0.1:4222 start_time=2026-05-22T17:02:32+02:00
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.system.control.PTAGS queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.system.control.PING queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.system.control.PING.NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.system.control.INFO.NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.system.control.LAMEDUCK.NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.system.control.AGENTID.NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS.agent.REGISTER.* queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.*.control.AUCTION queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.*.control.UNDEPLOY.* queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.*.control.ADEPLOY.* queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.*.control.CLONE.* queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
2026-05-22 17:02:32 [DEBUG] Subscribed to nats subject subject=$NEX.SVC.*.control.WPING queue_group=NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS
agent-watcher | 2026-05-22 17:02:32 [DEBUG] starting embedded agent agent_id=eQ6NkJBundwJEROrX2MYcb restart_count=0
2026-05-22 17:02:32 [INFO] agent registered name=go_exec type=native agent_id=eQ6NkJBundwJEROrX2MYcb
2026-05-22 17:02:32 [INFO] nex node ready

server2: check for nodes, start a workload:

0 user@server2 .../~ $ nex -s nats://127.0.0.1:4222 node ls
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                           Nex Nodes                                                           │
├───────┬──────────────────────────────────────────────────────────┬──────────────┬─────────┬────────┬─────────┬────────────────┤
│ Nexus │ ID (* = Lameduck Mode)                                   │ Name         │ Version │ Uptime │ State   │ Running Agents │
├───────┼──────────────────────────────────────────────────────────┼──────────────┼─────────┼────────┼─────────┼────────────────┤
│ nexus │ NDZPOXFASJBXXDWWW67MNGMKICZJUG6YEIWOEERB3CIZQSYMKU6TLPIS │ nex-node-988 │ 0.4.1   │ 36s    │ running │              1 │
╰───────┴──────────────────────────────────────────────────────────┴──────────────┴─────────┴────────┴─────────┴────────────────╯
0 user@server2 .../~ $ nex -s nats://127.0.0.1:4222 workload start -f Nexfile
Workload system-reporter [eQ6NkJBundwJEROrX2MYmH] successfully started

server1: workload received and starts running:

2026-05-22 17:03:22 [DEBUG] responding to auction auctionId=18TnhcFRObFVg4fOeslizw
native-agent | 2026-05-22 17:03:23 [DEBUG] start workload request received workloadId=eQ6NkJBundwJEROrX2MYmH namespace=system
native-agent | 2026-05-22 17:03:23 [DEBUG] namespace created namespace=system
native-agent | 2026-05-22 17:03:23 [DEBUG] located artifact artifact_reference=&{sh  file:///bin/sh /bin/sh 97995faa249e5706dd0b0373c9da547709bf7349755d5fc8e52f97a4bd04feaf 1389072}
native-agent | 2026-05-22 17:03:23 [DEBUG] running binary binary=file:///bin/sh args=[-c while true; do echo "[$(date '+%!Y(MISSING)-%!m(MISSING)-%!d(MISSING) %!H(MISSING):%!M(MISSING):%!S(MISSING)')] Uptime: $(uptime)"; sleep 30; done]
native-agent | 2026-05-22 17:03:23 [DEBUG] workload created namespace=system workloadId=eQ6NkJBundwJEROrX2MYmH restart=false
2026/05/22 17:03:23 INFO [2026-05-22 17:03:23] Uptime:  17:03:23 up 15 days,  7:58,  3 users,  load average: 0.02, 0.08, 0.06\n workload.agent=eQ6NkJBundwJEROrX2MYcb workload.workload=eQ6NkJBundwJEROrX2MYmH
2026/05/22 17:03:53 INFO [2026-05-22 17:03:53] Uptime:  17:03:53 up 15 days,  7:59,  3 users,  load average: 0.07, 0.09, 0.06\n workload.agent=eQ6NkJBundwJEROrX2MYcb workload.workload=eQ6NkJBundwJEROrX2MYmH

server2: stop workload:

0 user@server2 .../~ $ nex -s nats://127.0.0.1:4222 workload ls
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                              Running Workloads - system                                              │
├────────────────────────┬─────────────────┬───────────────────────────┬────────────────┬────────┬───────────┬─────────┤
│ Id                     │ Name            │ Start Time                │ Execution Time │ Type   │ Lifecycle │ State   │
├────────────────────────┼─────────────────┼───────────────────────────┼────────────────┼────────┼───────────┼─────────┤
│ eQ6NkJBundwJEROrX2MYmH │ system-reporter │ 2026-05-22T17:03:23+02:00 │ --             │ native │ service   │ running │
╰────────────────────────┴─────────────────┴───────────────────────────┴────────────────┴────────┴───────────┴─────────╯
0 user@server2 .../~ $ nex -s nats://127.0.0.1:4222 workload stop eQ6NkJBundwJEROrX2MYmH
Workload eQ6NkJBundwJEROrX2MYmH successfully stopped

server1: workload stop seems to have been received but continues running (until node terminated with Ctrl-C) - note the "attempted to unregister a non-existent trigger workload_id=..." message:

native-agent | 2026-05-22 17:04:00 [DEBUG] attempted to unregister a non-existent trigger workload_id=eQ6NkJBundwJEROrX2MYmH
2026/05/22 17:04:23 INFO [2026-05-22 17:04:23] Uptime:  17:04:23 up 15 days,  7:59,  3 users,  load average: 0.04, 0.08, 0.06\n workload.agent=eQ6NkJBundwJEROrX2MYcb workload.workload=eQ6NkJBundwJEROrX2MYmH
2026/05/22 17:04:53 INFO [2026-05-22 17:04:53] Uptime:  17:04:53 up 15 days,  8:00,  3 users,  load average: 0.02, 0.07, 0.06\n workload.agent=eQ6NkJBundwJEROrX2MYcb workload.workload=eQ6NkJBundwJEROrX2MYmH
2026/05/22 17:05:23 INFO [2026-05-22 17:05:23] Uptime:  17:05:23 up 15 days,  8:00,  3 users,  load average: 0.01, 0.06, 0.06\n workload.agent=eQ6NkJBundwJEROrX2MYcb workload.workload=eQ6NkJBundwJEROrX2MYmH
2026/05/22 17:05:53 INFO [2026-05-22 17:05:53] Uptime:  17:05:53 up 15 days,  8:01,  3 users,  load average: 0.01, 0.05, 0.05\n workload.agent=eQ6NkJBundwJEROrX2MYcb workload.workload=eQ6NkJBundwJEROrX2MYmH
^Cagent-watcher | 2026-05-22 17:06:12 [INFO] shutting down agent watcher agent_count=1
native-agent | 2026-05-22 17:06:12 [DEBUG] workload exited without error workload_id=eQ6NkJBundwJEROrX2MYmH namespace=system exit_code=-1
agent-watcher | 2026-05-22 17:06:12 [DEBUG] stopped embedded agent agent_id=eQ6NkJBundwJEROrX2MYcb
agent-watcher | 2026-05-22 17:06:12 [INFO] agent watcher shutdown complete
2026-05-22 17:06:12 [INFO] nex node stopped uptime=3m40.247359777s

The Nexfile used is identical to the quickstart example:

0 user@server2 .../~ $ cat Nexfile 
name: "system-reporter"
description: "Reports system stats every 30 seconds"
type: native
lifecycle: service
start_request:
  uri: "file:///bin/sh"
  argv:
    - "-c"
    - "while true; do echo \"[$(date '+%Y-%m-%d %H:%M:%S')] Uptime: $(uptime)\"; sleep 30; done"

Stopping a fictitious workload also returns success btw:

0 user@server2 .../~ $
nex -s nats://127.0.0.1:4222 workload stop foobar
Workload foobar successfully stopped

Why isn't the workload getting stopped properly - me doing s.th. wrong or bug?

Best regards
Holger

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions