Skip to content

epic: Sprint 18 - AAMI Node Agent Implementation #11

@fregataa

Description

@fregataa

Overview

Lightweight Go-based agent that runs on monitored nodes, pulls scripts/policies from Config Server, executes them, and reports status back.

Goals

  • Support large-scale node management in air-gapped environments
  • Overcome limitations of cron-based approach (error handling, state management, retry logic)
  • Integrated script deployment mechanism with Config Server

Non-Goals

  • Push-based execution (SSH) - future phase
  • Real-time WebSocket communication - future phase

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Config Server                             │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────────────┐ │
│  │ Targets  │  │ Scripts  │  │ Policies │  │ Agent API        │ │
│  └──────────┘  └──────────┘  └──────────┘  └──────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
                              ▲
                              │ HTTPS (poll)
                              │
        ┌─────────────────────┼─────────────────────┐
        │                     │                     │
        ▼                     ▼                     ▼
   ┌─────────┐           ┌─────────┐           ┌─────────┐
   │  Agent  │           │  Agent  │           │  Agent  │
   │ Node-01 │           │ Node-02 │           │ Node-03 │
   └─────────┘           └─────────┘           └─────────┘

Key Design Decisions

  1. Pull-based: Firewall/NAT friendly, only outbound connections from nodes
  2. Go single binary: No dependencies, easy deployment in air-gapped environments
  3. Hash-based change detection: Prevent unnecessary re-execution
  4. Graceful degradation: Maintain last state on server connection failure

Directory Structure

services/aami-agent/
├── cmd/
│   └── agent/
│       └── main.go                 # Entry point
├── internal/
│   ├── config/
│   │   └── config.go               # Configuration management
│   ├── client/
│   │   └── api_client.go           # Config Server API client
│   ├── executor/
│   │   ├── executor.go             # Script execution engine
│   │   └── result.go               # Execution result types
│   ├── poller/
│   │   └── poller.go               # Polling loop
│   ├── state/
│   │   └── state.go                # Local state management (JSON file)
│   └── reporter/
│       └── reporter.go             # Status/result reporting
├── scripts/
│   └── install-agent.sh            # Installation script
├── Dockerfile
├── go.mod
├── go.sum
└── README.md

Implementation Phases

Phase 1: Core Agent (Priority: High)

Basic polling and script execution functionality.

Components:

  • Config Module - YAML configuration loading
  • API Client - Config Server API integration
  • Executor - Script execution with timeout
  • State Manager - Local state persistence (JSON)
  • Poller - Main polling loop

Tasks:

  • Create services/aami-agent/ directory structure
  • Implement Config module
  • Implement API Client (using existing /api/v1/checks/target/hostname/:hostname)
  • Implement Executor
  • Implement State Manager
  • Implement Poller
  • Implement Main entry point
  • Unit tests

Phase 2: Server-side Agent API (Priority: High)

Add Agent-specific API endpoints to Config Server.

New Endpoints:

POST /api/v1/agent/heartbeat          # Heartbeat + node status
POST /api/v1/agent/executions         # Script execution result reporting
GET  /api/v1/agent/config             # Agent configuration (poll interval, etc.)

Database Schema:

-- Agent status tracking table
CREATE TABLE agent_status (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    target_id UUID NOT NULL REFERENCES targets(id),
    agent_version VARCHAR(50),
    last_heartbeat TIMESTAMPTZ,
    last_poll TIMESTAMPTZ,
    uptime_seconds BIGINT,
    scripts_total INT DEFAULT 0,
    scripts_success INT DEFAULT 0,
    scripts_failed INT DEFAULT 0,
    system_info JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- Script execution history table
CREATE TABLE script_executions (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    target_id UUID NOT NULL REFERENCES targets(id),
    script_policy_id UUID REFERENCES script_policies(id),
    script_name VARCHAR(255) NOT NULL,
    script_version VARCHAR(50),
    script_hash VARCHAR(64),
    started_at TIMESTAMPTZ NOT NULL,
    finished_at TIMESTAMPTZ,
    duration_ms INT,
    exit_code INT,
    success BOOLEAN,
    stdout TEXT,
    stderr TEXT,
    error_message TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW()
);

Tasks:

  • Agent status domain model
  • Script execution domain model
  • Agent repository implementation
  • Agent service implementation
  • Agent handler implementation
  • Register Agent API in router
  • Database migration

Phase 3: Installation & Deployment (Priority: Medium)

Installation scripts and systemd service configuration.

Components:

  • install-agent.sh - Automated installation script
  • systemd unit file for service management
  • --agent option in bootstrap.sh
  • Dockerfile for containerized deployment

Tasks:

  • install-agent.sh script
  • systemd unit file
  • Add --agent option to bootstrap.sh
  • Dockerfile
  • Add agent example to docker-compose

Phase 4: Web UI Integration (Priority: Medium)

Agent status and execution history UI.

Features:

  • Agent status section on Target detail page
  • Agent version, last heartbeat, uptime display
  • Script execution success/failure counts
  • Execution history table with filtering

Tasks:

  • Agent status API client (Web UI)
  • Agent status component
  • Execution history table component
  • Integration with Target detail page

Phase 5: Advanced Features (Priority: Low)

5.1 Force Execution

  • Trigger immediate script execution from Web UI
  • Agent detects force flag on next poll

5.2 Agent Groups

  • Group-based poll interval configuration
  • Group-based script assignment

5.3 Metrics Export

  • Agent's own Prometheus metrics endpoint
  • aami_agent_poll_duration_seconds
  • aami_agent_scripts_executed_total
  • aami_agent_scripts_failed_total

CLI Interface

# Installation
curl -fsSL https://config-server/install-agent.sh | bash -s -- \
  --server https://config-server:8080 \
  --token <bootstrap-token>

# Direct execution (debugging)
aami-agent run \
  --server https://config-server:8080 \
  --hostname $(hostname) \
  --poll-interval 30s \
  --verbose

# Status check
aami-agent status

# Force poll
aami-agent poll --now

# Version check
aami-agent version

Migration from Cron

Coexistence Period

  1. Existing cron method and Agent can run simultaneously
  2. Option to auto-disable cron job when installing Agent
  3. Support gradual migration

Migration Procedure

  1. Deploy Agent binary
  2. Install and verify Agent on test node
  3. Stop cron job: systemctl stop crond or remove /etc/cron.d/aami-*
  4. Enable Agent: systemctl start aami-agent
  5. Establish monitoring and rollback plan

Security Considerations

  1. TLS Communication: HTTPS with Config Server
  2. Token Authentication: Bootstrap token or Agent-specific token
  3. Script Verification: Hash-based integrity check
  4. Least Privilege: Agent requests only necessary permissions
  5. Log Security: Sensitive information masking

Success Metrics

Metric Target
Agent installation success rate 99%+
Poll success rate 99.9%+
Script execution success rate Per policy
Heartbeat interval compliance 95%+ (±5s)
Resource usage CPU < 1%, Memory < 50MB

Dependencies

Related Documents

  • Sprint Planning: .agent/planning/sprints/planned/sprint-18-node-agent.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions