Add deployable passive agent runtime by BadgerOps · Pull Request #54 · BadgerOps/grapheon

BadgerOps · 2026-03-22T21:25:26Z

Summary

This PR delivers the first deployable passive-agent slice for Graphēon end to end.

It adds:

a lightweight host-side passive runtime
direct manual CLI execution with documented flags and --help
systemd deployment units and install helper
passive collection from local commands only
gzip-compressed outbound-only check-ins
admin-driven per-agent API-key rotation/reissue
versioned agent packaging and release automation
an agent GHCR container image and release tarball
docs and quickstart updates for manual, systemd, artifact, and container usage
a small Nix shell fix so the backend test suite runs cleanly in the dev environment

Closes #48.
Related: #53.

What Changed

Host-side passive runtime

Added a new stdlib-only runtime under agent/:

agent/grapheon_agent.py
agent/tests/test_grapheon_agent.py
agent/README.md
agent/VERSION
agent/CHANGELOG.md

Runtime behavior:

generates and persists a random agent_uuid
stores local state under /var/lib/grapheon-agent
registers outbound using the existing enrollment-key flow
waits cleanly in pending state until approval
stores the one-time per-agent API key locally once issued
uses cached backend policy for cadence, jitter, timeouts, and command enablement
computes additive deltas against the last successful snapshot
sends gzip-compressed JSON reports to POST /api/agents/check-in
reports its version from agent/VERSION instead of a hardcoded string

Passive/local-only command set in this slice:

ip -json addr show
ip -json neigh show
ip -json route show
ss -tunapH
netstat -tunap as fallback

This keeps the runtime aligned with the MVP constraints:

outbound only
no active scanning
low CPU/memory/disk/network impact
one-shot execution model instead of a heavyweight daemon

Manual CLI mode

The agent is now a first-class command-line tool, not just something hidden behind a systemd oneshot unit.

Added and documented:

richer --help output with examples
--register-only
--check-in-only
direct invocation from a repo checkout or installed host copy

Manual workflows now supported explicitly:

register or poll approval without collecting
force an immediate check-in from a local state directory
run with an installed config file without using systemd

This makes the runtime easier to test, debug, and operate during rollout.

Deployment artifacts and container

Added:

deploy/grapheon-agent.service
deploy/grapheon-agent.timer
deploy/grapheon-agent.env.example
scripts/install-passive-agent.sh
scripts/build-agent-artifact.sh
agent/Dockerfile

Deployment/distribution model:

systemd oneshot service
systemd timer every 15 minutes by default
local runtime gating based on cached backend policy
simple installer that places runtime files under /opt/grapheon, seeds /etc/grapheon-agent.env, and creates /var/lib/grapheon-agent
versioned release tarball: grapheon-agent-vX.Y.Z.tar.gz
GHCR image: ghcr.io/badgerops/grapheon-agent:latest and :vX.Y.Z

The agent image is intended for per-host deployment and is documented with host network/PID namespace examples for containerized runs.

GitHub release automation

Extended .github/workflows/release.yml so merges to master now also release the passive agent.

New behavior:

reads agent/VERSION
validates agent/CHANGELOG.md
checks for agent-vX.Y.Z
creates the agent-vX.Y.Z tag and GitHub release
uploads grapheon-agent-vX.Y.Z.tar.gz to that release
builds and pushes ghcr.io/badgerops/grapheon-agent:latest
builds and pushes ghcr.io/badgerops/grapheon-agent:vX.Y.Z

The workflow name is updated from Release Images to Release Components to reflect the broader scope.

Backend follow-up: API-key recovery

The runtime surfaced an operational gap: if an approved agent loses /var/lib/grapheon-agent/api_key, there was no recovery path short of re-enrollment.

This PR adds:

POST /api/agents/{id}/rotate-api-key

Behavior:

admin-only
only valid for active agents
rotates the stored per-agent API key
returns the new raw secret once
immediately invalidates the previous key

This keeps bootstrap and steady-state auth separate while giving operators a clean recovery path.

Version bumps and changelogs

Updated:

backend/VERSION → 0.10.0
backend/CHANGELOG.md
agent/VERSION → 0.2.0
agent/CHANGELOG.md
scripts/validate_versions.py now validates agent version/changelog sync too

Documentation

Updated:

README.md
docs/README.md
docs/agents.md
docs/agent_quickstart.md
docs/deployment.md
docs/release-process.md
backend/README.md
agent/README.md

Docs now describe:

the shipped runtime
direct CLI usage and --help
register-only and check-in-only modes
install from repo checkout or release tarball
containerized agent runs from GHCR
the install and approval flow
the systemd timer model
local agent state files
current delta behavior and limitations
API-key rotation/recovery workflow
the new component release process for the agent

Dev shell fix

Updated flake.nix to include:

iproute2
nettools
util-linux
stdenv.cc.cc.lib

Also exports LD_LIBRARY_PATH for the C++ runtime library so greenlet / SQLAlchemy async imports work under nix develop.

That change unblocked the previously failing backend test collection in this environment.

Key Decisions

One-shot runtime, not a long-running daemon

The agent runs as a short-lived process from a systemd timer by default, but the same runtime can now be executed directly with flags for manual workflows. That preserves the low-footprint design while making rollout and debugging practical.

Random persisted `agent_uuid`

Identity remains a locally persisted random UUID, not a MAC-derived identifier. MACs are still collected as metadata but are not treated as identity.

Additive delta reports

The runtime sends additive/update-only deltas based on the last successful snapshot. This keeps reports small and avoids needing delete semantics the backend does not yet model.

Current limitation:

removals are not represented yet

Recovery via API-key rotation

Instead of overloading registration or requiring re-enrollment, recovery is handled by an explicit admin rotation endpoint. That keeps the agent auth model simple and operationally clear.

Agent-specific versioning and releases

The passive agent now has its own version and changelog. That keeps host-runtime release cadence independent from backend/frontend changes while still using the same GitHub/GHCR release flow.

Files Of Interest

agent/grapheon_agent.py
agent/Dockerfile
agent/VERSION
agent/CHANGELOG.md
agent/tests/test_grapheon_agent.py
scripts/build-agent-artifact.sh
scripts/install-passive-agent.sh
.github/workflows/release.yml
backend/routers/agents.py
backend/schemas.py
backend/tests/test_agents.py
backend/VERSION
backend/CHANGELOG.md
docs/agent_quickstart.md
docs/agents.md
docs/release-process.md
docs/deployment.md
flake.nix

Testing

Ran:

python3 scripts/validate_versions.py
bash -n scripts/build-agent-artifact.sh
bash scripts/build-agent-artifact.sh 0.2.0 /tmp/grapheon-agent-v0.2.0.tar.gz
tar -tzf /tmp/grapheon-agent-v0.2.0.tar.gz
python3 agent/grapheon_agent.py --help
nix develop -c .venv/bin/python -m pytest

Result:

version/changelog validation passes
release tarball contents look correct
full suite passes in this branch: 367 passed, 1 skipped

Follow-Up Work

#53 Package and distribute the passive agent runtime

This PR adds the first release artifact and GHCR image, but broader packaging/distribution work is still tracked in #53.

Likely deliverables there:

upgrade/rollback workflow around released artifacts
optional .deb / .rpm packaging
stronger artifact signing/integrity story

Remaining broader agent work

Still out of scope here:

frontend fleet and approval UI
richer agent operational views and health summaries
server-side raw-command ingestion or alternate collectors
optional future mTLS hardening if we choose to add it later

Ship the first host-side passive agent runtime for Graphēon with a lightweight one-shot collector, systemd service/timer units, installation helper, agent tests, and updated operator docs. The runtime keeps host impact low by collecting only local passive command output, storing a persistent random agent_uuid plus API key locally, using cached policy cadence and jitter, and sending gzip-compressed additive deltas over the existing outbound-only check-in API. Make the runtime a first-class manual CLI tool: improve --help output, add explicit register-only and check-in-only modes, document direct flag-driven execution from a repo checkout or installed host copy, and add unit tests for the help text and manual mode behavior. Add admin-driven per-agent API-key rotation/reissue so approved agents can recover from a lost local api_key file without re-enrollment. Update the backend schemas, router, tests, and docs accordingly. Add passive-agent release packaging and distribution: version the agent explicitly, add an agent changelog, build a release tarball, publish a GHCR agent container image, and extend the GitHub release workflow so master creates agent releases alongside backend and frontend releases. Bump the backend and agent versions and update release/deployment documentation. Finally, fix the Nix dev shell so the backend test suite can run under nix develop by exposing the missing libstdc++ runtime needed by greenlet/SQLAlchemy async imports.

BadgerOps force-pushed the passive-agent-runtime branch from b42a0e6 to ff0544e Compare March 22, 2026 21:34

BadgerOps force-pushed the passive-agent-runtime branch from ff0544e to 85dc514 Compare March 22, 2026 22:49

BadgerOps merged commit 29e1b38 into master Mar 22, 2026
3 checks passed

BadgerOps deleted the passive-agent-runtime branch March 22, 2026 22:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deployable passive agent runtime#54

Add deployable passive agent runtime#54
BadgerOps merged 1 commit intomasterfrom
passive-agent-runtime

BadgerOps commented Mar 22, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BadgerOps commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Host-side passive runtime

Manual CLI mode

Deployment artifacts and container

GitHub release automation

Backend follow-up: API-key recovery

Version bumps and changelogs

Documentation

Dev shell fix

Key Decisions

One-shot runtime, not a long-running daemon

Random persisted agent_uuid

Additive delta reports

Recovery via API-key rotation

Agent-specific versioning and releases

Files Of Interest

Testing

Follow-Up Work

#53 Package and distribute the passive agent runtime

Remaining broader agent work

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BadgerOps commented Mar 22, 2026 •

edited

Loading

Random persisted `agent_uuid`