Skip to content

Add deployable passive agent runtime#54

Merged
BadgerOps merged 1 commit intomasterfrom
passive-agent-runtime
Mar 22, 2026
Merged

Add deployable passive agent runtime#54
BadgerOps merged 1 commit intomasterfrom
passive-agent-runtime

Conversation

@BadgerOps
Copy link
Copy Markdown
Owner

@BadgerOps BadgerOps commented Mar 22, 2026

Summary

This PR delivers the first deployable passive-agent slice for Graphēon end to end.

It adds:

  • a lightweight host-side passive runtime
  • direct manual CLI execution with documented flags and --help
  • systemd deployment units and install helper
  • passive collection from local commands only
  • gzip-compressed outbound-only check-ins
  • admin-driven per-agent API-key rotation/reissue
  • versioned agent packaging and release automation
  • an agent GHCR container image and release tarball
  • docs and quickstart updates for manual, systemd, artifact, and container usage
  • a small Nix shell fix so the backend test suite runs cleanly in the dev environment

Closes #48.
Related: #53.

What Changed

Host-side passive runtime

Added a new stdlib-only runtime under agent/:

  • agent/grapheon_agent.py
  • agent/tests/test_grapheon_agent.py
  • agent/README.md
  • agent/VERSION
  • agent/CHANGELOG.md

Runtime behavior:

  • generates and persists a random agent_uuid
  • stores local state under /var/lib/grapheon-agent
  • registers outbound using the existing enrollment-key flow
  • waits cleanly in pending state until approval
  • stores the one-time per-agent API key locally once issued
  • uses cached backend policy for cadence, jitter, timeouts, and command enablement
  • computes additive deltas against the last successful snapshot
  • sends gzip-compressed JSON reports to POST /api/agents/check-in
  • reports its version from agent/VERSION instead of a hardcoded string

Passive/local-only command set in this slice:

  • ip -json addr show
  • ip -json neigh show
  • ip -json route show
  • ss -tunapH
  • netstat -tunap as fallback

This keeps the runtime aligned with the MVP constraints:

  • outbound only
  • no active scanning
  • low CPU/memory/disk/network impact
  • one-shot execution model instead of a heavyweight daemon

Manual CLI mode

The agent is now a first-class command-line tool, not just something hidden behind a systemd oneshot unit.

Added and documented:

  • richer --help output with examples
  • --register-only
  • --check-in-only
  • direct invocation from a repo checkout or installed host copy

Manual workflows now supported explicitly:

  • register or poll approval without collecting
  • force an immediate check-in from a local state directory
  • run with an installed config file without using systemd

This makes the runtime easier to test, debug, and operate during rollout.

Deployment artifacts and container

Added:

  • deploy/grapheon-agent.service
  • deploy/grapheon-agent.timer
  • deploy/grapheon-agent.env.example
  • scripts/install-passive-agent.sh
  • scripts/build-agent-artifact.sh
  • agent/Dockerfile

Deployment/distribution model:

  • systemd oneshot service
  • systemd timer every 15 minutes by default
  • local runtime gating based on cached backend policy
  • simple installer that places runtime files under /opt/grapheon, seeds /etc/grapheon-agent.env, and creates /var/lib/grapheon-agent
  • versioned release tarball: grapheon-agent-vX.Y.Z.tar.gz
  • GHCR image: ghcr.io/badgerops/grapheon-agent:latest and :vX.Y.Z

The agent image is intended for per-host deployment and is documented with host network/PID namespace examples for containerized runs.

GitHub release automation

Extended .github/workflows/release.yml so merges to master now also release the passive agent.

New behavior:

  • reads agent/VERSION
  • validates agent/CHANGELOG.md
  • checks for agent-vX.Y.Z
  • creates the agent-vX.Y.Z tag and GitHub release
  • uploads grapheon-agent-vX.Y.Z.tar.gz to that release
  • builds and pushes ghcr.io/badgerops/grapheon-agent:latest
  • builds and pushes ghcr.io/badgerops/grapheon-agent:vX.Y.Z

The workflow name is updated from Release Images to Release Components to reflect the broader scope.

Backend follow-up: API-key recovery

The runtime surfaced an operational gap: if an approved agent loses /var/lib/grapheon-agent/api_key, there was no recovery path short of re-enrollment.

This PR adds:

  • POST /api/agents/{id}/rotate-api-key

Behavior:

  • admin-only
  • only valid for active agents
  • rotates the stored per-agent API key
  • returns the new raw secret once
  • immediately invalidates the previous key

This keeps bootstrap and steady-state auth separate while giving operators a clean recovery path.

Version bumps and changelogs

Updated:

  • backend/VERSION0.10.0
  • backend/CHANGELOG.md
  • agent/VERSION0.2.0
  • agent/CHANGELOG.md
  • scripts/validate_versions.py now validates agent version/changelog sync too

Documentation

Updated:

  • README.md
  • docs/README.md
  • docs/agents.md
  • docs/agent_quickstart.md
  • docs/deployment.md
  • docs/release-process.md
  • backend/README.md
  • agent/README.md

Docs now describe:

  • the shipped runtime
  • direct CLI usage and --help
  • register-only and check-in-only modes
  • install from repo checkout or release tarball
  • containerized agent runs from GHCR
  • the install and approval flow
  • the systemd timer model
  • local agent state files
  • current delta behavior and limitations
  • API-key rotation/recovery workflow
  • the new component release process for the agent

Dev shell fix

Updated flake.nix to include:

  • iproute2
  • nettools
  • util-linux
  • stdenv.cc.cc.lib

Also exports LD_LIBRARY_PATH for the C++ runtime library so greenlet / SQLAlchemy async imports work under nix develop.

That change unblocked the previously failing backend test collection in this environment.

Key Decisions

One-shot runtime, not a long-running daemon

The agent runs as a short-lived process from a systemd timer by default, but the same runtime can now be executed directly with flags for manual workflows. That preserves the low-footprint design while making rollout and debugging practical.

Random persisted agent_uuid

Identity remains a locally persisted random UUID, not a MAC-derived identifier. MACs are still collected as metadata but are not treated as identity.

Additive delta reports

The runtime sends additive/update-only deltas based on the last successful snapshot. This keeps reports small and avoids needing delete semantics the backend does not yet model.

Current limitation:

  • removals are not represented yet

Recovery via API-key rotation

Instead of overloading registration or requiring re-enrollment, recovery is handled by an explicit admin rotation endpoint. That keeps the agent auth model simple and operationally clear.

Agent-specific versioning and releases

The passive agent now has its own version and changelog. That keeps host-runtime release cadence independent from backend/frontend changes while still using the same GitHub/GHCR release flow.

Files Of Interest

  • agent/grapheon_agent.py
  • agent/Dockerfile
  • agent/VERSION
  • agent/CHANGELOG.md
  • agent/tests/test_grapheon_agent.py
  • scripts/build-agent-artifact.sh
  • scripts/install-passive-agent.sh
  • .github/workflows/release.yml
  • backend/routers/agents.py
  • backend/schemas.py
  • backend/tests/test_agents.py
  • backend/VERSION
  • backend/CHANGELOG.md
  • docs/agent_quickstart.md
  • docs/agents.md
  • docs/release-process.md
  • docs/deployment.md
  • flake.nix

Testing

Ran:

  • python3 scripts/validate_versions.py
  • bash -n scripts/build-agent-artifact.sh
  • bash scripts/build-agent-artifact.sh 0.2.0 /tmp/grapheon-agent-v0.2.0.tar.gz
  • tar -tzf /tmp/grapheon-agent-v0.2.0.tar.gz
  • python3 agent/grapheon_agent.py --help
  • nix develop -c .venv/bin/python -m pytest

Result:

  • version/changelog validation passes
  • release tarball contents look correct
  • full suite passes in this branch: 367 passed, 1 skipped

Follow-Up Work

#53 Package and distribute the passive agent runtime

This PR adds the first release artifact and GHCR image, but broader packaging/distribution work is still tracked in #53.

Likely deliverables there:

  • upgrade/rollback workflow around released artifacts
  • optional .deb / .rpm packaging
  • stronger artifact signing/integrity story

Remaining broader agent work

Still out of scope here:

  • frontend fleet and approval UI
  • richer agent operational views and health summaries
  • server-side raw-command ingestion or alternate collectors
  • optional future mTLS hardening if we choose to add it later

@BadgerOps BadgerOps force-pushed the passive-agent-runtime branch from b42a0e6 to ff0544e Compare March 22, 2026 21:34
Ship the first host-side passive agent runtime for Graphēon with a lightweight one-shot collector, systemd service/timer units, installation helper, agent tests, and updated operator docs. The runtime keeps host impact low by collecting only local passive command output, storing a persistent random agent_uuid plus API key locally, using cached policy cadence and jitter, and sending gzip-compressed additive deltas over the existing outbound-only check-in API.

Make the runtime a first-class manual CLI tool: improve --help output, add explicit register-only and check-in-only modes, document direct flag-driven execution from a repo checkout or installed host copy, and add unit tests for the help text and manual mode behavior.

Add admin-driven per-agent API-key rotation/reissue so approved agents can recover from a lost local api_key file without re-enrollment. Update the backend schemas, router, tests, and docs accordingly.

Add passive-agent release packaging and distribution: version the agent explicitly, add an agent changelog, build a release tarball, publish a GHCR agent container image, and extend the GitHub release workflow so master creates agent releases alongside backend and frontend releases. Bump the backend and agent versions and update release/deployment documentation.

Finally, fix the Nix dev shell so the backend test suite can run under nix develop by exposing the missing libstdc++ runtime needed by greenlet/SQLAlchemy async imports.
@BadgerOps BadgerOps force-pushed the passive-agent-runtime branch from ff0544e to 85dc514 Compare March 22, 2026 22:49
@BadgerOps BadgerOps merged commit 29e1b38 into master Mar 22, 2026
3 checks passed
@BadgerOps BadgerOps deleted the passive-agent-runtime branch March 22, 2026 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add deployable passive agent with outbound-only check-in

1 participant