Update release with new fabrica-based services; remove old services#50
Update release with new fabrica-based services; remove old services#50travisbcotton wants to merge 60 commits into
Conversation
9bf779d to
ef8d070
Compare
|
Just a couple of other notes before merging. We need to update the
We also need to update |
|
Another note...we're going to update the CoreDHCP config in Here's snippet of the tutorial config should look like after the changes: - coresmd: |
svc_base_uri=https://demo.openchami.cluster:8443
ipxe_base_uri=http://172.16.0.254:8081
ca_cert=/root_ca/root_ca.crt
cache_valid=30s
lease_time=1h
single_port=false
- bootloop: |
lease_file=/tmp/coredhcp.db
script_path=default
lease_time=5m
ipv4_start=172.16.0.200
ipv4_end=172.16.0.250 |
|
A couple of changes:
|
8ab666e to
fc525d0
Compare
|
We'll need |
|
We'll have to note these major changes in the release notes once this is merged. We'll want to bump the minor version on the tag. |
651ce7d to
caa1bd3
Compare
|
Should we provide a Edit: Just to add, here's the default boot-service config.yaml: systemd/configs/boot-service.yaml# SPDX-FileCopyrightText: 2025 OpenCHAMI Contributors
#
# SPDX-License-Identifier: MIT
# OpenCHAMI Boot Service Configuration Example
#
# This is a comprehensive example configuration file for the OpenCHAMI boot service.
# To use this configuration:
# 1. Copy this file to config.yaml: cp config.example.yaml config.yaml
# 2. Customize the settings below for your environment
# 3. Remove or comment out sections you don't need
#
# Configuration precedence (highest to lowest):
# 1. Command-line flags
# 2. Environment variables (e.g., BOOT_SERVICE_PORT=8082)
# 3. Configuration file (config.yaml)
# 4. Default values
# =============================================================================
# SERVER CONFIGURATION
# =============================================================================
# HTTP server settings
port: 8082 # Port to listen on
host: "0.0.0.0" # Interface to bind to (0.0.0.0 for all interfaces)
read_timeout: 30 # HTTP read timeout in seconds
write_timeout: 30 # HTTP write timeout in seconds
idle_timeout: 120 # HTTP idle timeout in seconds
# =============================================================================
# STORAGE CONFIGURATION
# =============================================================================
# Data storage settings
data_dir: "./data" # Directory for storing boot configurations
storage_type: "file" # Storage backend: "file", "database" (future)
# Database settings (when storage_type: "database")
# database:
# driver: "postgres"
# host: "localhost"
# port: 5432
# name: "boot_service"
# user: "boot_user"
# password: "boot_password"
# ssl_mode: "require"
# max_connections: 25
# connection_timeout: 30
# =============================================================================
# FEATURE TOGGLES
# =============================================================================
# Authentication
enable_auth: false # Enable TokenSmith JWT authentication
# Set to true for production environments
# Metrics and monitoring
enable_metrics: true # Enable Prometheus metrics endpoint
metrics_port: 9092 # Port for metrics endpoint (/metrics)
# API compatibility
enable_legacy_api: true # Enable legacy BSS-compatible endpoints
# Disable to force use of new API only
# =============================================================================
# AUTHENTICATION CONFIGURATION (when enable_auth: true)
# =============================================================================
auth:
# Core authentication settings
enabled: false # Must match enable_auth above
# JWT validation method (choose one):
# Option 1: JWKS URL (recommended for production)
jwks_url: "https://auth.openchami.org/.well-known/jwks.json"
jwks_refresh_interval: "1h" # How often to refresh JWKS cache
# Option 2: Static RSA public key (for development/testing)
# jwt_public_key: |
# -----BEGIN PUBLIC KEY-----
# MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA...
# -----END PUBLIC KEY-----
# JWT validation options
jwt_issuer: "https://auth.openchami.org" # Expected token issuer
jwt_audience: "boot-service" # Expected token audience
validate_expiration: true # Check token expiration
validate_issuer: true # Validate issuer claim
validate_audience: true # Validate audience claim
# Authorization requirements
required_claims: ["sub", "iss", "aud"] # Required JWT claims
required_scopes: ["boot:read"] # Required OAuth2 scopes
# Development/testing options (never use in production)
allow_empty_token: false # Allow requests without tokens
non_enforcing: false # Log auth failures but don't block requests
# =============================================================================
# HARDWARE STATE MANAGER INTEGRATION
# =============================================================================
# HSM (Hardware State Manager) settings
hsm_url: "http://localhost:27779" # URL of the HSM service
# Set to your HSM endpoint
# TokenSmith-backed HSM service authentication
# When both hsm_url and tokensmith_url are configured, boot-service exchanges a
# bootstrap token for short-lived service tokens and adds them to HSM requests.
# Standardized env vars: TOKENSMITH_URL, TOKENSMITH_BOOTSTRAP_TOKEN,
# TOKENSMITH_TARGET_SERVICE, TOKENSMITH_SCOPES, TOKENSMITH_REFRESH_SKEW_SEC
tokensmith_url: "http://localhost:8080"
tokensmith_target_service: "hsm"
tokensmith_scopes: "hsm:read"
tokensmith_refresh_skew_sec: 120
# tokensmith_bootstrap_token: "<bootstrap-jwt>" # Prefer env var for secrets
# Environment fallback: TOKENSMITH_BOOTSTRAP_TOKEN
# HSM authentication (when HSM requires auth)
# hsm_auth:
# type: "service_token" # Authentication type for HSM
# service_name: "boot-service"
# token_endpoint: "http://tokensmith:8080/token"
# =============================================================================
# EXTERNAL SERVICES
# =============================================================================
# TokenSmith authentication service (when enable_auth: true)
tokensmith:
url: "http://localhost:8080" # TokenSmith service URL
timeout: 30 # Request timeout in seconds
# Service-to-service authentication
service_auth:
enabled: false # Enable service tokens
service_name: "boot-service" # This service's identifier
token_endpoint: "/token" # Token endpoint path
# BSS (Boot Script Service) integration
bss:
enabled: false # Enable BSS integration
url: "http://localhost:27778" # BSS service URL
timeout: 30 # Request timeout in seconds
# =============================================================================
# LOGGING AND MONITORING
# =============================================================================
# Logging configuration
logging:
level: "info" # Log level: debug, info, warn, error
format: "json" # Log format: json, text
output: "stdout" # Log output: stdout, stderr, file
# file: "/var/log/boot-service.log" # Log file (when output: file)
# Health check configuration
health:
enabled: true # Enable health check endpoint
endpoint: "/health" # Health check URL path
timeout: 5 # Health check timeout in seconds
# =============================================================================
# PERFORMANCE AND SCALING
# =============================================================================
# Request limits
limits:
max_request_size: "10MB" # Maximum request body size
max_concurrent: 100 # Maximum concurrent requests
rate_limit: 1000 # Requests per minute per IP
# Caching (future feature)
# cache:
# enabled: false
# type: "memory" # Cache type: memory, redis
# ttl: "5m" # Cache TTL
# max_size: "100MB" # Maximum cache size
# =============================================================================
# DEVELOPMENT AND TESTING
# =============================================================================
# Development mode settings
development:
enabled: false # Enable development mode
cors_enabled: true # Enable CORS for browser testing
cors_origins: ["*"] # Allowed CORS origins
debug_endpoints: false # Enable debug/diagnostic endpoints
mock_services: false # Use mock external services
# =============================================================================
# DEPLOYMENT ENVIRONMENT EXAMPLES
# =============================================================================
# Uncomment and modify one of these sections for your deployment environment:
# --- Development Environment ---
# enable_auth: false
# enable_metrics: true
# logging:
# level: "debug"
# development:
# enabled: true
# debug_endpoints: true
# --- Production Environment ---
# enable_auth: true
# enable_metrics: true
# auth:
# enabled: true
# jwks_url: "https://auth.openchami.org/.well-known/jwks.json"
# jwt_issuer: "https://auth.openchami.org"
# jwt_audience: "boot-service"
# required_scopes: ["boot:read"]
# logging:
# level: "info"
# format: "json"
# --- Kubernetes/Container Environment ---
# port: 8080
# host: "0.0.0.0"
# data_dir: "/data"
# auth:
# jwks_url: "http://tokensmith:8080/.well-known/jwks.json"
# jwt_issuer: "openchami-tokensmith"
# jwt_audience: "openchami-cluster"
# hsm_url: "http://smd:27779"
# logging:
# format: "json"
# output: "stdout" |
We may want to add default hostname rules since the default if none is to prefix with The above will make the node hostnames be like |
synackd
left a comment
There was a problem hiding this comment.
Initial code review without testing this yet.
240792f to
a800adc
Compare
synackd
left a comment
There was a problem hiding this comment.
Testing now. I get:
sed: can't read /etc/containers/systemd/opaal.container: No such file or directory
when running the openchami-certificate-update script.
If getting rid of hydra, we'll want to remove references to it, e.g. in
release/scripts/openchami_profile.sh
Line 27 in d77457c
We can probably just get rid of those functions.
There was a problem hiding this comment.
We might have to mount a volume now and set --data-dir with some of the upstream changes to how metadata-service works. I get a "permission denied" error when I try to start it with pr-8 with the current Exec.
I tried adding a volume like Volume=/opt/workdir/data:/data and chmod 777 /opt/workdir/data and that fixes the permission denied issue for me.
May 12 14:51:27 openchami-testing.novalocal systemd[1]: Started The metadata-service container.
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: 2026/05/12 14:51:27 Starting github.com/OpenCHAMI/metadata-service server...
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: Error: failed to initialize file storage: failed to create file backend: failed to create base directory /data: mkdir /data: permission denied
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: Usage:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: ochami-metadata-server serve [flags]
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: Flags:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --data-dir string Directory for file storage (default "/data")
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: -h, --help help for serve
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --host string Host to bind to (default "0.0.0.0")
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --idle-timeout int Idle timeout in seconds (default 60)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: -p, --port int Port to listen on (default 8080)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --read-timeout int Read timeout in seconds (default 15)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --wireguard-only Restrict access to WireGuard network only
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --wireguard-server string Enable WireGuard userspace controller (CIDR, e.g. 100.97.0.1/16)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --wireguard-state-file string Path to WireGuard state file for persistence (default "/data/wireguard/state.yaml")
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --write-timeout int Write timeout in seconds (default 15)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: Global Flags:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --config string config file (default is $HOME/.ochami-metadata.yaml)
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: --debug Enable debug logging
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]:
May 12 14:51:27 openchami-testing.novalocal metadata-service[2974370]: 2026/05/12 14:51:27 failed to initialize file storage: failed to create file backend: failed to create base directory /data: mkdir /data: permission deniedWe'll also have to remove the --tokensmith-url flag as well at least for now.
There was a problem hiding this comment.
I added a volume for metadata to store things. also removed --tokensmith-url until that is added back in
There was a problem hiding this comment.
I think this was literally added back today with this PR:
OpenCHAMI/metadata-service#12
There was a problem hiding this comment.
Unresolving for further discussion/action. I assume we'll want this.
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
… out lines too Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
…state Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
…container to use it Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
…se it Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
3cc4ae4 to
49b143d
Compare
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
…ient arg Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Travis Cotton <trcotton@lanl.gov>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Signed-off-by: Devon Bautista <17506592+synackd@users.noreply.github.com>
Pull Request Template
Thank you for your contribution! Please ensure the following before submitting:
Checklist
make test(or equivalent) locally and all tests passgit commit -s) with my real name and email<filename>.licensesidecarLICENSES/directoryDescription
Please include a summary of the change and which issue is fixed.
Also include relevant motivation and context.
Fixes #(issue)
Type of Change
For more info, see Contributing Guidelines.