-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Overview
Lightweight Go-based agent that runs on monitored nodes, pulls scripts/policies from Config Server, executes them, and reports status back.
Goals
- Support large-scale node management in air-gapped environments
- Overcome limitations of cron-based approach (error handling, state management, retry logic)
- Integrated script deployment mechanism with Config Server
Non-Goals
- Push-based execution (SSH) - future phase
- Real-time WebSocket communication - future phase
Architecture
┌─────────────────────────────────────────────────────────────────┐
│ Config Server │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │
│ │ Targets │ │ Scripts │ │ Policies │ │ Agent API │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
▲
│ HTTPS (poll)
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Agent │ │ Agent │ │ Agent │
│ Node-01 │ │ Node-02 │ │ Node-03 │
└─────────┘ └─────────┘ └─────────┘
Key Design Decisions
- Pull-based: Firewall/NAT friendly, only outbound connections from nodes
- Go single binary: No dependencies, easy deployment in air-gapped environments
- Hash-based change detection: Prevent unnecessary re-execution
- Graceful degradation: Maintain last state on server connection failure
Directory Structure
services/aami-agent/
├── cmd/
│ └── agent/
│ └── main.go # Entry point
├── internal/
│ ├── config/
│ │ └── config.go # Configuration management
│ ├── client/
│ │ └── api_client.go # Config Server API client
│ ├── executor/
│ │ ├── executor.go # Script execution engine
│ │ └── result.go # Execution result types
│ ├── poller/
│ │ └── poller.go # Polling loop
│ ├── state/
│ │ └── state.go # Local state management (JSON file)
│ └── reporter/
│ └── reporter.go # Status/result reporting
├── scripts/
│ └── install-agent.sh # Installation script
├── Dockerfile
├── go.mod
├── go.sum
└── README.md
Implementation Phases
Phase 1: Core Agent (Priority: High)
Basic polling and script execution functionality.
Components:
- Config Module - YAML configuration loading
- API Client - Config Server API integration
- Executor - Script execution with timeout
- State Manager - Local state persistence (JSON)
- Poller - Main polling loop
Tasks:
- Create
services/aami-agent/directory structure - Implement Config module
- Implement API Client (using existing
/api/v1/checks/target/hostname/:hostname) - Implement Executor
- Implement State Manager
- Implement Poller
- Implement Main entry point
- Unit tests
Phase 2: Server-side Agent API (Priority: High)
Add Agent-specific API endpoints to Config Server.
New Endpoints:
POST /api/v1/agent/heartbeat # Heartbeat + node status
POST /api/v1/agent/executions # Script execution result reporting
GET /api/v1/agent/config # Agent configuration (poll interval, etc.)
Database Schema:
-- Agent status tracking table
CREATE TABLE agent_status (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
target_id UUID NOT NULL REFERENCES targets(id),
agent_version VARCHAR(50),
last_heartbeat TIMESTAMPTZ,
last_poll TIMESTAMPTZ,
uptime_seconds BIGINT,
scripts_total INT DEFAULT 0,
scripts_success INT DEFAULT 0,
scripts_failed INT DEFAULT 0,
system_info JSONB,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Script execution history table
CREATE TABLE script_executions (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
target_id UUID NOT NULL REFERENCES targets(id),
script_policy_id UUID REFERENCES script_policies(id),
script_name VARCHAR(255) NOT NULL,
script_version VARCHAR(50),
script_hash VARCHAR(64),
started_at TIMESTAMPTZ NOT NULL,
finished_at TIMESTAMPTZ,
duration_ms INT,
exit_code INT,
success BOOLEAN,
stdout TEXT,
stderr TEXT,
error_message TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
);Tasks:
- Agent status domain model
- Script execution domain model
- Agent repository implementation
- Agent service implementation
- Agent handler implementation
- Register Agent API in router
- Database migration
Phase 3: Installation & Deployment (Priority: Medium)
Installation scripts and systemd service configuration.
Components:
install-agent.sh- Automated installation script- systemd unit file for service management
--agentoption inbootstrap.sh- Dockerfile for containerized deployment
Tasks:
- install-agent.sh script
- systemd unit file
- Add --agent option to bootstrap.sh
- Dockerfile
- Add agent example to docker-compose
Phase 4: Web UI Integration (Priority: Medium)
Agent status and execution history UI.
Features:
- Agent status section on Target detail page
- Agent version, last heartbeat, uptime display
- Script execution success/failure counts
- Execution history table with filtering
Tasks:
- Agent status API client (Web UI)
- Agent status component
- Execution history table component
- Integration with Target detail page
Phase 5: Advanced Features (Priority: Low)
5.1 Force Execution
- Trigger immediate script execution from Web UI
- Agent detects force flag on next poll
5.2 Agent Groups
- Group-based poll interval configuration
- Group-based script assignment
5.3 Metrics Export
- Agent's own Prometheus metrics endpoint
aami_agent_poll_duration_secondsaami_agent_scripts_executed_totalaami_agent_scripts_failed_total
CLI Interface
# Installation
curl -fsSL https://config-server/install-agent.sh | bash -s -- \
--server https://config-server:8080 \
--token <bootstrap-token>
# Direct execution (debugging)
aami-agent run \
--server https://config-server:8080 \
--hostname $(hostname) \
--poll-interval 30s \
--verbose
# Status check
aami-agent status
# Force poll
aami-agent poll --now
# Version check
aami-agent versionMigration from Cron
Coexistence Period
- Existing cron method and Agent can run simultaneously
- Option to auto-disable cron job when installing Agent
- Support gradual migration
Migration Procedure
- Deploy Agent binary
- Install and verify Agent on test node
- Stop cron job:
systemctl stop crondor remove/etc/cron.d/aami-* - Enable Agent:
systemctl start aami-agent - Establish monitoring and rollback plan
Security Considerations
- TLS Communication: HTTPS with Config Server
- Token Authentication: Bootstrap token or Agent-specific token
- Script Verification: Hash-based integrity check
- Least Privilege: Agent requests only necessary permissions
- Log Security: Sensitive information masking
Success Metrics
| Metric | Target |
|---|---|
| Agent installation success rate | 99%+ |
| Poll success rate | 99.9%+ |
| Script execution success rate | Per policy |
| Heartbeat interval compliance | 95%+ (±5s) |
| Resource usage | CPU < 1%, Memory < 50MB |
Dependencies
- feat(config-server): Implement generic async Job Manager #3 - Generic async Job Manager
- feat(config-server): Apply Job Manager to long-running API endpoints #4 - Apply Job Manager to long-running API endpoints
- Go 1.23+
- Config Server API (existing)
- PostgreSQL (existing)
- systemd (Linux nodes)
Related Documents
- Sprint Planning:
.agent/planning/sprints/planned/sprint-18-node-agent.md
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels