From 6482813d95c60cdb739298885133a9b4f4062754 Mon Sep 17 00:00:00 2001 From: aRustyDev <36318507+aRustyDev@users.noreply.github.com> Date: Sat, 3 Jan 2026 19:57:50 -0500 Subject: [PATCH] feat(cicd-github-actions-ops): improve skill documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Closes #590 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude --- .../skills/cicd-github-actions-ops/SKILL.md | 256 ++----- .../references/debugging.md | 170 ++++- .../references/monitoring.md | 449 ++++++++++++ .../references/multi-repo.md | 149 +++- .../references/performance.md | 584 ++++++++++++++++ .../references/quick-reference.md | 295 ++++++++ .../references/security.md | 637 ++++++++++++++++++ 7 files changed, 2347 insertions(+), 193 deletions(-) create mode 100644 components/skills/cicd-github-actions-ops/references/monitoring.md create mode 100644 components/skills/cicd-github-actions-ops/references/performance.md create mode 100644 components/skills/cicd-github-actions-ops/references/quick-reference.md create mode 100644 components/skills/cicd-github-actions-ops/references/security.md diff --git a/components/skills/cicd-github-actions-ops/SKILL.md b/components/skills/cicd-github-actions-ops/SKILL.md index 44803261..9d473210 100644 --- a/components/skills/cicd-github-actions-ops/SKILL.md +++ b/components/skills/cicd-github-actions-ops/SKILL.md @@ -157,218 +157,101 @@ Use consistent patterns across all repositories. ## Systematic Review Workflow -### Phase 0: Fork Detection +### Quick Assessment -Before reviewing, check if the repository is a fork: +Before detailed review, assess complexity and identify fork-specific issues: ```bash -# Check if repo is a fork +# Check if repo is a fork and assess complexity gh repo view --json isFork,parent -q '{fork: .isFork, parent: .parent.nameWithOwner}' +echo "=== Workflow Count ===" && ls -1 .github/workflows/*.yml 2>/dev/null | wc -l ``` -**If forked, identify upstream-specific patterns:** +**Workflow complexity tiers:** -| Pattern | Detection | Common Issues | -|---------|-----------|---------------| -| External deploy target | `external_repository:` in workflow | Deploys to upstream's gh-pages | -| Deploy keys | `secrets.DEPLOY_KEY` | Secret doesn't exist in fork | -| Hardcoded org | `google/timesketch` in workflow | Wrong target org | -| Upstream branches | `branches: [main]` when fork uses `master` | Branch mismatch | -| Upstream composite actions | `uses: /.github/actions/` | Action path doesn't exist in fork | -| Hardcoded Docker namespace | `docker.*/` | Pushes to wrong Docker Hub namespace | -| External registries | `hub.infinyon.cloud` or similar | Upstream-specific package registry | -| Upstream secrets | `secrets.ORG_*` or `secrets.DOCKER_*` | Organization secrets not available | +| Tier | Workflows | Approach | +|------|-----------|----------| +| Simple | 1-5 | Fix all in one PR | +| Medium | 6-10 | Fix by priority, 1-2 PRs | +| Complex | 11+ | Incremental fixes, multiple PRs | -```bash -# Comprehensive fork detection -grep -rE "external_repository:|DEPLOY_KEY|\.github/actions/" .github/workflows/ -grep -rE "secrets\.(ORG_|DOCKER_|SLACK_|AWS_)" .github/workflows/ -grep -rE "https?://[a-z-]+\.[a-z]+\.(cloud|io)/" .github/workflows/ | grep -v github -``` - -**Fork handling options:** - -1. **Disable** - Rename to `.yml.disabled` (recommended for deploy workflows) -2. **Adapt** - Modify to work with your fork -3. **Remove** - Delete if not needed -4. **Keep** - Leave as-is if it will work (rare) - -```bash -# Disable a workflow -mv .github/workflows/deploy.yml .github/workflows/deploy.yml.disabled - -# Find upstream-specific patterns -grep -r "external_repository\|DEPLOY_KEY\|google/" .github/workflows/ -``` - -### Phase 0.5: Complexity Assessment - -Before diving into fixes, assess the scope of work: - -```bash -# Count workflows and total lines -echo "=== Workflow Complexity ===" -ls -1 .github/workflows/*.yml 2>/dev/null | wc -l | xargs echo "Workflow count:" -wc -l .github/workflows/*.yml 2>/dev/null | tail -1 | awk '{print "Total lines:", $1}' - -# Count action dependencies -echo "=== Action Dependencies ===" -grep -h "uses:" .github/workflows/*.yml 2>/dev/null | wc -l | xargs echo "Action references:" -grep -h "uses:" .github/workflows/*.yml 2>/dev/null | grep -oE '[^/]+/[^@]+' | sort -u | wc -l | xargs echo "Unique actions:" - -# Count job dependencies (complexity indicator) -echo "=== Job Dependencies ===" -grep -c "needs:" .github/workflows/*.yml 2>/dev/null | awk -F: '{sum+=$2} END {print "Total needs: clauses:", sum}' - -# Matrix sprawl check -echo "=== Matrix Size ===" -grep -A20 "matrix:" .github/workflows/*.yml 2>/dev/null | grep -E "^\s+-\s" | wc -l | xargs echo "Matrix entries:" -``` - -**Complexity tiers:** +**Fork-specific patterns to detect:** +- `external_repository:` or `secrets.DEPLOY_KEY` → Upstream deployment +- `secrets.(ORG_|DOCKER_|AWS_)` → Organization secrets not available +- `uses: /.github/actions/` → Composite actions missing in fork -| Tier | Workflows | Lines | Approach | -|------|-----------|-------|----------| -| Simple | 1-5 | <500 | Fix all in one PR | -| Medium | 6-10 | 500-1500 | Fix by priority, 1-2 PRs | -| Complex | 11+ | 1500+ | Incremental fixes, multiple PRs | -| Massive | 15+ | 3000+ | Consider disable-first strategy | +See: [multi-repo.md](references/multi-repo.md) for detailed fork handling and complexity assessment. -**If complexity is High/Massive:** +### Review Process -1. Start with disabling non-essential workflows -2. Focus on Priority 2 fixes (concurrency, path filters) first -3. Address failures incrementally -4. Document known limitations that won't be fixed +1. **Categorize Issues**: Working vs. reasonable vs. passing vs. efficient +2. **Fix by Priority**: Follow Priority 1-6 order (see above) +3. **Track Decisions**: Create tracking issues for action choices +4. **Validate Changes**: Use `actionlint` and version checks +5. **Document Limitations**: When partial fixes are acceptable -### Phase 1: Gather Information +See: [debugging.md](references/debugging.md) for detailed systematic debugging process. -```bash -# List all open PRs across your repos -gh search prs --author aRustyDev --state open --limit 100 +## Security & Hardening -# List failed workflow runs -gh run list --repo / --status failure --limit 20 +Essential security practices for production workflows: -# Get workflow files for a repo -gh api repos///contents/.github/workflows | jq -r '.[].name' +### Token Permissions +```yaml +permissions: + contents: read # Default, restrict further if possible + pull-requests: write # Only for PR workflows + id-token: write # Only for OIDC authentication ``` -### Phase 2: Categorize Issues - -For each PR/failure, categorize: - -1. **Workflow broken** - Action itself has bugs -2. **Workflow inefficient** - Runs unnecessarily -3. **Test failure** - Code issue, not workflow -4. **Permission issue** - Token/access problems -5. **Environment issue** - Runner/dependency problems -6. **Flaky test** - Intermittent failures - -### Phase 3: Fix by Category - -| Category | Action | -|----------|--------| -| Workflow broken | Fix workflow, update action versions | -| Workflow inefficient | Add path filters, concurrency | -| Test failure | Fix code, not workflow | -| Permission issue | Adjust permissions block | -| Environment issue | Pin versions, add setup steps | -| Flaky test | Add retry or fix root cause | - -### Phase 4: Track Decisions - -For every non-trivial decision, create appropriate tracking: +### Common Security Issues -- **Chose reliable over fancy** → Issue in `arustydev/gha` -- **Chose third-party over self-hosted** → Issue in `arustydev/gha` -- **Found bug in action** → Issue in action's repo -- **Need new action** → Issue in `arustydev/gha` - -### Phase 5: Validate Before Committing - -Before committing workflow changes, validate them: +| Issue | Risk | Fix | +|-------|------|-----| +| `permissions: write-all` | Full repo access | Use minimal permissions | +| Secrets in logs | Credential exposure | Use `::add-mask::` for outputs | +| Untrusted input in scripts | Code injection | Sanitize `${{ github.event.* }}` | +| Third-party actions without pinning | Supply chain attacks | Pin to commit SHA: `@abc123...` | +### Security Validation ```bash -# 1. Check YAML syntax and common issues -actionlint .github/workflows/*.yml +# Check for overprivileged workflows +grep -r "permissions:" .github/workflows/ | grep -E "(write-all|write.*write)" -# 2. Verify action versions exist -for action in $(grep -h "uses:" .github/workflows/*.yml | grep -oE '[^/]+/[^@]+@v[0-9]+' | sort -u); do - repo=$(echo "$action" | cut -d@ -f1) - version=$(echo "$action" | cut -d@ -f2) - echo -n "$action: " - gh api "repos/$repo/git/refs/tags/$version" --silent && echo "OK" || echo "NOT FOUND" -done +# Find unpinned actions (security risk) +grep -r "uses:.*@v[0-9]" .github/workflows/ -# 3. Check for deprecated actions -grep -r "actions-rs/\|set-output\|save-state" .github/workflows/ && echo "WARNING: Deprecated patterns found" +# Check for potential injection points +grep -r "github\.event\." .github/workflows/ ``` -**Common validation failures:** - -| Error | Cause | Fix | -|-------|-------|-----| -| `action version not found` | Invalid version (v6 doesn't exist) | Check [action-selection.md](references/action-selection.md) for valid versions | -| `set-output is deprecated` | Old output syntax | Use `echo "name=value" >> $GITHUB_OUTPUT` | -| `save-state is deprecated` | Old state syntax | Use `echo "name=value" >> $GITHUB_STATE` | - -### Phase 6: Partial Fixes and Known Limitations - -Not every issue can or should be fully fixed. Know when to stop. - -**When to accept a partial fix:** - -| Situation | Action | -|-----------|--------| -| Fixing requires rewriting >50% of workflow | Disable or document limitation | -| Need to create custom actions for fork | Document as future work | -| External service dependencies can't be removed | Disable affected jobs/workflows | -| Upstream architecture tightly coupled | Accept reduced CI coverage | - -**Documenting known limitations:** - -When creating a PR with partial fixes, include a "Known Limitations" section: +See: [security.md](references/security.md) for comprehensive security hardening guide. -```markdown -### Known Limitations +## Performance Optimization -The following issues remain after this fix: +Key areas for workflow performance improvement: -| Issue | Reason | Impact | -|-------|--------|--------| -| `cli_smoke` job fails | Uses upstream's Infinyon Hub | Integration tests don't run | -| Docker builds use wrong namespace | Would require forking build scripts | Images not pushed | - -These would require significant refactoring to address. +### Concurrency Control +```yaml +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: ${{ github.ref != 'refs/heads/main' }} ``` -**When to ask the user:** - -If any of these apply, use AskUserQuestion before proceeding: +### Efficient Triggers +- Use `paths:` filters to avoid unnecessary runs +- Prefer `pull_request` over `push` for PR validation +- Use `workflow_dispatch` for manual triggers instead of broad automation -- Complete fix requires >2 hours of refactoring -- Fix would change core project behavior -- Multiple equally valid approaches exist -- Fork has diverged significantly from upstream - -**Incremental progress strategy:** - -For complex repositories, prefer multiple small PRs: - -``` -PR 1: Disable non-essential workflows (quick win) - ↓ -PR 2: Add concurrency blocks to remaining workflows - ↓ -PR 3: Fix path filters and triggers - ↓ -PR 4: Address specific test failures - ↓ -(Optional) PR 5: Deep refactoring if needed +### Caching Strategies +```yaml +- uses: actions/cache@v4.1.2 + with: + path: ~/.cargo + key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }} ``` -Each PR should be independently mergeable and improve the situation. +See: [performance.md](references/performance.md) for optimization strategies and monitoring. ## Quick Commands @@ -407,7 +290,12 @@ done ## See Also -- Reference: [debugging.md](references/debugging.md) - Detailed debugging guide -- Reference: [action-selection.md](references/action-selection.md) - Action selection criteria -- Reference: [issue-templates.md](references/issue-templates.md) - Issue templates for tracking -- Reference: [multi-repo.md](references/multi-repo.md) - Multi-repository batch review +- **Quick Reference**: [quick-reference.md](references/quick-reference.md) - Essential commands and patterns +- **Deep Dive References**: + - [debugging.md](references/debugging.md) - Systematic debugging procedures + - [security.md](references/security.md) - Security hardening and best practices + - [performance.md](references/performance.md) - Performance optimization strategies + - [monitoring.md](references/monitoring.md) - Monitoring and alerting setup + - [multi-repo.md](references/multi-repo.md) - Multi-repository review workflows + - [action-selection.md](references/action-selection.md) - Action selection criteria + - [issue-templates.md](references/issue-templates.md) - Issue templates for tracking diff --git a/components/skills/cicd-github-actions-ops/references/debugging.md b/components/skills/cicd-github-actions-ops/references/debugging.md index 7389490e..1effd4ee 100644 --- a/components/skills/cicd-github-actions-ops/references/debugging.md +++ b/components/skills/cicd-github-actions-ops/references/debugging.md @@ -257,15 +257,177 @@ act -s GITHUB_TOKEN="$(gh auth token)" act pull_request -e event.json ``` +## Advanced Debugging Techniques + +### Workflow Performance Analysis + +```bash +# Analyze slow-running workflows +gh run list --limit 20 --json workflowName,startedAt,updatedAt,displayTitle | \ + jq -r '.[] | [.workflowName, (.updatedAt | fromdateiso8601) - (.startedAt | fromdateiso8601), .displayTitle] | @tsv' | \ + sort -nrk2 | head -10 +``` + +### Job Dependency Analysis + +```bash +# Visualize job dependencies in complex workflows +grep -A 20 "needs:" .github/workflows/*.yml | \ + grep -E "(needs:|name:)" | \ + paste - - | \ + column -t +``` + +### Matrix Job Failure Patterns + +```bash +# Find which matrix combinations fail most often +gh run list --status failure --limit 100 --json displayTitle | \ + jq -r '.[].displayTitle' | \ + grep -oE '\([^)]+\)' | \ + sort | uniq -c | sort -nr +``` + +### Secret and Environment Variable Debug + +```yaml +- name: Debug secrets (safely) + run: | + echo "Available secrets:" + env | grep -E '^[A-Z_]+=' | grep -v TOKEN | head -10 + echo "Context info:" + echo "Repository: ${{ github.repository }}" + echo "Event: ${{ github.event_name }}" + echo "Actor: ${{ github.actor }}" + echo "Ref: ${{ github.ref }}" +``` + +### Artifact Analysis + +```bash +# Download and analyze build artifacts +gh run download +find . -name "*.log" -exec echo "=== {} ===" \; -exec head -50 {} \; +``` + +### Container/Docker Issues + +```yaml +- name: Debug Docker environment + if: failure() + run: | + echo "=== Docker Info ===" + docker --version + docker system df + echo "=== Running Containers ===" + docker ps -a + echo "=== Images ===" + docker images + echo "=== Networks ===" + docker network ls +``` + +## Emergency Response Procedures + +### Mass Workflow Failures + +When multiple workflows fail simultaneously: + +```bash +#!/bin/bash +# mass-failure-triage.sh - Quick triage for widespread failures + +echo "=== Mass Failure Analysis ===" + +# Count failures by time +gh run list --status failure --limit 50 --json createdAt | \ + jq -r '.[].createdAt' | \ + cut -dT -f1 | \ + sort | uniq -c + +# Common error patterns +echo "=== Common Error Patterns ===" +for run in $(gh run list --status failure --limit 20 --json id -q '.[].id'); do + gh run view "$run" --log-failed 2>/dev/null | \ + grep -E "(Error:|ERROR:|FAILED)" | \ + head -5 +done | sort | uniq -c | sort -nr | head -10 + +# Check GitHub Status +echo "=== GitHub Status ===" +curl -s https://www.githubstatus.com/api/v2/status.json | jq -r '.status.description' +``` + +### Rollback Strategy + +```bash +# Quick rollback for broken workflow +git log -1 --format="%H" -- .github/workflows/.yml # Get current commit +git checkout HEAD~1 -- .github/workflows/.yml # Rollback one commit +git add .github/workflows/.yml +git commit -m "fix(ci): rollback to working state" +``` + +## Integration with Monitoring + +### Automated Failure Detection + +```yaml +# .github/workflows/failure-monitor.yml +name: Failure Monitor +on: + workflow_run: + workflows: ["CI", "Tests", "Build"] + types: [completed] + +jobs: + notify-on-failure: + if: github.event.workflow_run.conclusion == 'failure' + runs-on: ubuntu-latest + steps: + - name: Analyze failure + run: | + echo "Failed workflow: ${{ github.event.workflow_run.name }}" + echo "Run ID: ${{ github.event.workflow_run.id }}" + + # Fetch failure details + gh run view ${{ github.event.workflow_run.id }} --log-failed | \ + grep -E "(Error:|ERROR:)" | head -10 + env: + GH_TOKEN: ${{ github.token }} + + - name: Create issue for repeated failures + uses: actions/github-script@v7 + with: + script: | + // Logic to create issue if same workflow fails >3 times + // Include failure analysis and suggested fixes +``` + ## Debugging Checklist +### Pre-Debugging +- [ ] Check GitHub Status page for service issues +- [ ] Verify recent changes to workflow files +- [ ] Check if failure is isolated or widespread + +### During Debugging - [ ] Read the full error message - [ ] Identify which step failed - [ ] Check if it's consistent or flaky -- [ ] Check recent changes to workflow or code - [ ] Verify permissions are correct - [ ] Check for environment differences -- [ ] Try reproducing locally +- [ ] Try reproducing locally with act + +### Post-Debugging +- [ ] Document the fix in commit message +- [ ] Update workflow if needed to prevent recurrence - [ ] Check action versions and changelogs -- [ ] Search for known issues in action repo -- [ ] Add debug logging if needed +- [ ] Create tracking issue if it's a recurring problem +- [ ] Add monitoring if it's a critical workflow + +## Cross-References + +- [monitoring.md](monitoring.md) - Set up proactive monitoring to catch issues early +- [security.md](security.md) - Security-related debugging (permissions, secrets) +- [performance.md](performance.md) - Performance-related failures and optimization diff --git a/components/skills/cicd-github-actions-ops/references/monitoring.md b/components/skills/cicd-github-actions-ops/references/monitoring.md new file mode 100644 index 00000000..762b7be9 --- /dev/null +++ b/components/skills/cicd-github-actions-ops/references/monitoring.md @@ -0,0 +1,449 @@ +# Monitoring & Alerting Reference + +Comprehensive guide for tracking GitHub Actions workflow health and performance across repositories. + +## Overview + +Proactive monitoring prevents CI/CD issues from accumulating and helps maintain healthy development velocity. + +## Workflow Health Metrics + +### Key Performance Indicators + +| Metric | Healthy Range | Alert Threshold | Command | +|--------|---------------|-----------------|---------| +| Workflow Success Rate | >95% | <90% | `gh run list --status failure --limit 50` | +| Average Build Time | Baseline +20% | Baseline +50% | `gh run list --limit 100 --json duration` | +| Queue Time | <2 minutes | >5 minutes | `gh run list --json createdAt,runStartedAt` | +| Failed Runs per Day | <5 | >10 | `gh run list --created $(date -d '24 hours ago' +%Y-%m-%d)` | + +### Daily Health Check + +```bash +#!/bin/bash +# daily-ci-health.sh - Run daily to check CI health + +repo=${1:-$(gh repo view --json nameWithOwner -q .nameWithOwner)} +echo "=== CI Health Report for $repo ===" + +# Recent failure rate +total=$(gh run list --repo "$repo" --limit 100 --json status | jq length) +failed=$(gh run list --repo "$repo" --limit 100 --status failure --json status | jq length) +success_rate=$(( (total - failed) * 100 / total )) + +echo "Success Rate (last 100 runs): $success_rate%" +if [ $success_rate -lt 90 ]; then + echo "⚠️ WARNING: Success rate below 90%" +fi + +# Failed runs today +today_failed=$(gh run list --repo "$repo" --status failure --created $(date +%Y-%m-%d) --json status | jq length) +echo "Failed runs today: $today_failed" +if [ $today_failed -gt 10 ]; then + echo "🚨 ALERT: More than 10 failures today" +fi + +# Long-running workflows +echo "=== Long-running workflows ===" +gh run list --repo "$repo" --limit 10 --json displayTitle,duration | \ + jq -r '.[] | select(.duration > 600) | "\(.displayTitle): \(.duration)s"' +``` + +## Multi-Repository Monitoring + +### Organization Health Dashboard + +```bash +#!/bin/bash +# org-ci-dashboard.sh - Monitor all repos in organization + +org=${1:-aRustyDev} +echo "=== Organization CI Health: $org ===" + +for repo in $(gh repo list "$org" --limit 100 --json name -q '.[].name'); do + echo "--- $repo ---" + + # Recent runs summary + total=$(gh run list --repo "$org/$repo" --limit 20 --json status 2>/dev/null | jq length) + if [ "$total" -eq 0 ]; then + echo "No recent runs" + continue + fi + + failed=$(gh run list --repo "$org/$repo" --limit 20 --status failure --json status 2>/dev/null | jq length) + success_rate=$(( (total - failed) * 100 / total )) + + if [ $success_rate -lt 80 ]; then + echo "🚨 $success_rate% success rate - NEEDS ATTENTION" + elif [ $success_rate -lt 95 ]; then + echo "⚠️ $success_rate% success rate - monitor" + else + echo "✅ $success_rate% success rate" + fi +done +``` + +### Repository Ranking by CI Health + +```bash +# Rank repositories by CI reliability +gh api graphql -f query=' +query { + organization(login: "aRustyDev") { + repositories(first: 50, orderBy: {field: PUSHED_AT, direction: DESC}) { + nodes { + name + defaultBranchRef { + target { + ... on Commit { + checkSuites(first: 10) { + nodes { + conclusion + status + } + } + } + } + } + } + } + } +}' | jq -r '.data.organization.repositories.nodes[] | + select(.defaultBranchRef.target.checkSuites.nodes | length > 0) | + [.name, (.defaultBranchRef.target.checkSuites.nodes | + map(select(.conclusion == "SUCCESS") | 1) | length), + (.defaultBranchRef.target.checkSuites.nodes | length)] | + @csv' | sort -t, -k2,2nr +``` + +## Failure Pattern Analysis + +### Common Failure Categories + +| Category | Detection Pattern | Alert Condition | +|----------|-------------------|-----------------| +| Flaky Tests | Same test fails intermittently | >3 failures/week for same test | +| Environment Issues | "No space left", "Connection timeout" | >2 env failures/day | +| Dependency Issues | "Package not found", "Version conflict" | Any dependency failure | +| Permission Issues | "Authentication failed", "Permission denied" | Any permission failure | +| Timeout Issues | "Job canceled due to timeout" | >30 minute jobs failing | + +### Failure Analysis Script + +```bash +#!/bin/bash +# analyze-failures.sh - Categorize recent failures + +repo=${1:-$(gh repo view --json nameWithOwner -q .nameWithOwner)} +days=${2:-7} + +echo "=== Failure Analysis: Last $days days ===" + +# Get failed runs from last N days +since_date=$(date -d "$days days ago" +%Y-%m-%d) +gh run list --repo "$repo" --status failure --created "$since_date" \ + --json displayTitle,conclusion,url | \ + jq -r '.[] | "\(.displayTitle) | \(.url)"' | \ + while read -r line; do + title=$(echo "$line" | cut -d'|' -f1 | xargs) + url=$(echo "$line" | cut -d'|' -f2 | xargs) + + # Categorize failure + case "$title" in + *"timeout"*|*"timed out"*) + echo "⏰ TIMEOUT: $title" + echo " $url" + ;; + *"space"*|*"disk"*|*"memory"*) + echo "💾 RESOURCE: $title" + echo " $url" + ;; + *"permission"*|*"authentication"*|*"forbidden"*) + echo "🔐 AUTH: $title" + echo " $url" + ;; + *"dependency"*|*"package"*|*"module"*) + echo "📦 DEPENDENCY: $title" + echo " $url" + ;; + *) + echo "❓ OTHER: $title" + echo " $url" + ;; + esac + done +``` + +## Performance Monitoring + +### Build Time Tracking + +```bash +#!/bin/bash +# track-build-times.sh - Monitor workflow performance trends + +repo=${1:-$(gh repo view --json nameWithOwner -q .nameWithOwner)} + +echo "=== Build Time Analysis ===" + +gh run list --repo "$repo" --limit 50 --json workflowName,displayTitle,runStartedAt,updatedAt | \ + jq -r '.[] | + [.workflowName, + (.updatedAt | fromdateiso8601) - (.runStartedAt | fromdateiso8601), + .displayTitle] | + @csv' | \ + sort -t, -k2,2nr | \ + head -20 | \ + while IFS=, read -r workflow duration title; do + workflow=$(echo "$workflow" | tr -d '"') + duration=$(echo "$duration" | tr -d '"') + title=$(echo "$title" | tr -d '"') + + minutes=$((duration / 60)) + seconds=$((duration % 60)) + + printf "%-20s %2dm%02ds %s\n" "$workflow" "$minutes" "$seconds" "$title" + done +``` + +### Resource Usage Trends + +```bash +# Monitor runner usage across workflows +gh api graphql -f query=' +query { + repository(owner: "aRustyDev", name: "your-repo") { + object(expression: "main:.github/workflows") { + ... on Tree { + entries { + name + object { + ... on Blob { + text + } + } + } + } + } + } +}' | jq -r '.data.repository.object.entries[].object.text' | \ + grep -E "runs-on:|timeout-minutes:" | \ + sort | uniq -c | sort -nr +``` + +## Alerting Strategies + +### Slack Integration + +```bash +#!/bin/bash +# slack-ci-alert.sh - Send Slack alerts for CI issues + +webhook_url="$SLACK_WEBHOOK_URL" +repo=${1:-$(gh repo view --json nameWithOwner -q .nameWithOwner)} + +# Check for critical failures +critical_failures=$(gh run list --repo "$repo" --status failure --limit 10 --json displayTitle | jq length) + +if [ "$critical_failures" -gt 5 ]; then + payload=$(cat <Weekly CI Health Report" + echo "

Top Failing Workflows

" + echo "" + echo "" + + for repo in $(gh repo list aRustyDev --limit 50 --json name -q '.[].name'); do + total=$(gh run list --repo "aRustyDev/$repo" --limit 100 --json status 2>/dev/null | jq length) + failed=$(gh run list --repo "aRustyDev/$repo" --limit 100 --status failure --json status 2>/dev/null | jq length) + + if [ "$total" -gt 0 ]; then + success_rate=$(( (total - failed) * 100 / total )) + echo "" + fi + done | sort -t'>' -k5,5n + + echo "
RepositoryFailuresSuccess Rate
$repo$failed$success_rate%
" +} | sendmail ci-team@company.com +``` + +## GitHub Actions Metrics API + +### Custom Metrics Collection + +```bash +#!/bin/bash +# collect-metrics.sh - Collect custom CI metrics + +repo=${1:-$(gh repo view --json nameWithOwner -q .nameWithOwner)} +output_file="ci-metrics-$(date +%Y%m%d).json" + +{ + echo "{" + echo " \"date\": \"$(date -I)\"," + echo " \"repository\": \"$repo\"," + + # Workflow summary + echo " \"workflows\": [" + gh api "repos/$repo/actions/workflows" | \ + jq -c '.workflows[] | {name: .name, state: .state, runs_url: .url}' | \ + while read -r workflow; do + echo " $workflow," + done | sed '$ s/,$//' + echo " ]," + + # Recent run statistics + echo " \"recent_runs\": {" + total_runs=$(gh run list --repo "$repo" --limit 100 --json status | jq length) + success_runs=$(gh run list --repo "$repo" --limit 100 --status success --json status | jq length) + failed_runs=$(gh run list --repo "$repo" --limit 100 --status failure --json status | jq length) + + echo " \"total\": $total_runs," + echo " \"success\": $success_runs," + echo " \"failed\": $failed_runs," + echo " \"success_rate\": $((success_runs * 100 / total_runs))" + echo " }" + echo "}" +} > "$output_file" + +echo "Metrics saved to: $output_file" +``` + +## Dashboard Setup + +### GitHub Repository Dashboard + +Create a simple HTML dashboard for monitoring multiple repositories: + +```html + + + + CI Health Dashboard + + + +

CI Health Dashboard

+
+ + + + +``` + +## Automation & Scheduling + +### Cron Jobs for Monitoring + +```bash +# Add to crontab for automated monitoring +# crontab -e + +# Daily health check at 9 AM +0 9 * * * /path/to/daily-ci-health.sh >> /var/log/ci-health.log 2>&1 + +# Weekly digest on Mondays at 8 AM +0 8 * * 1 /path/to/weekly-ci-digest.sh + +# Hourly failure alerts during business hours +0 9-17 * * 1-5 /path/to/slack-ci-alert.sh +``` + +### GitHub Actions Self-Monitoring + +```yaml +# .github/workflows/ci-monitoring.yml +name: CI Monitoring +on: + schedule: + - cron: '0 */6 * * *' # Every 6 hours + workflow_dispatch: + +jobs: + monitor: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Monitor CI Health + run: | + # Run monitoring scripts + ./scripts/daily-ci-health.sh + ./scripts/analyze-failures.sh + + - name: Post to Slack + if: failure() + uses: 8398a7/action-slack@v3 + with: + status: failure + text: "CI monitoring detected issues" + env: + SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }} +``` + +## Cross-References + +- [debugging.md](debugging.md) - How to debug specific workflow failures +- [performance.md](performance.md) - Optimization strategies based on monitoring data +- [security.md](security.md) - Security monitoring for workflows \ No newline at end of file diff --git a/components/skills/cicd-github-actions-ops/references/multi-repo.md b/components/skills/cicd-github-actions-ops/references/multi-repo.md index 012cd530..b8bd9d1a 100644 --- a/components/skills/cicd-github-actions-ops/references/multi-repo.md +++ b/components/skills/cicd-github-actions-ops/references/multi-repo.md @@ -41,17 +41,156 @@ Group repos by type and apply consistent fixes: ### Strategy 3: Fork Audit -Special handling for forked repositories: +Special handling for forked repositories requires checking for upstream-specific patterns: ```bash # List all forks gh repo list aRustyDev --fork --limit 100 --json name,parent -q '.[] | "\(.name) <- \(.parent.nameWithOwner)"' ``` -For each fork: -1. Check for upstream-specific workflows -2. Disable or adapt as needed -3. Consider if fork is still needed +## Fork-Specific Issues Detection + +### Phase 0: Fork Detection & Complexity Assessment + +Before reviewing any repository, check if it's a fork and assess complexity: + +```bash +# Check if repo is a fork and get upstream info +gh repo view --json isFork,parent -q '{fork: .isFork, parent: .parent.nameWithOwner}' + +# Count workflows and complexity +echo "=== Workflow Complexity ===" +ls -1 .github/workflows/*.yml 2>/dev/null | wc -l | xargs echo "Workflow count:" +wc -l .github/workflows/*.yml 2>/dev/null | tail -1 | awk '{print "Total lines:", $1}' + +# Count action dependencies +echo "=== Action Dependencies ===" +grep -h "uses:" .github/workflows/*.yml 2>/dev/null | wc -l | xargs echo "Action references:" +grep -h "uses:" .github/workflows/*.yml 2>/dev/null | grep -oE '[^/]+/[^@]+' | sort -u | wc -l | xargs echo "Unique actions:" +``` + +### Fork Pattern Detection + +**Identify upstream-specific patterns that commonly break in forks:** + +| Pattern | Detection Command | Common Issues | +|---------|-------------------|---------------| +| External deploy target | `grep -r "external_repository:" .github/workflows/` | Deploys to upstream's gh-pages | +| Deploy keys | `grep -r "secrets.DEPLOY_KEY" .github/workflows/` | Secret doesn't exist in fork | +| Hardcoded org | `grep -r "google/timesketch" .github/workflows/` | Wrong target org | +| Upstream branches | `grep -r "branches: \[main\]" .github/workflows/` | Branch mismatch | +| Upstream composite actions | `grep -rE "uses: /.github/actions/" .github/workflows/` | Action path doesn't exist in fork | +| Hardcoded Docker namespace | `grep -rE "docker.*/" .github/workflows/` | Pushes to wrong Docker Hub namespace | +| External registries | `grep -rE "hub\.infinyon\.cloud" .github/workflows/` | Upstream-specific package registry | +| Upstream secrets | `grep -rE "secrets\.(ORG_|DOCKER_|SLACK_|AWS_)" .github/workflows/` | Organization secrets not available | + +**Comprehensive fork detection script:** +```bash +#!/bin/bash +# fork-patterns.sh - Detect upstream-specific patterns + +echo "=== Fork Pattern Detection ===" + +# External repositories and deploy keys +echo "--- Deploy Patterns ---" +grep -rE "external_repository:|DEPLOY_KEY|\.github/actions/" .github/workflows/ 2>/dev/null || echo "None found" + +# Organization and service secrets +echo "--- Secret Patterns ---" +grep -rE "secrets\.(ORG_|DOCKER_|SLACK_|AWS_)" .github/workflows/ 2>/dev/null || echo "None found" + +# External services and registries +echo "--- External Service Patterns ---" +grep -rE "https?://[a-z-]+\.[a-z]+\.(cloud|io)/" .github/workflows/ 2>/dev/null | grep -v github || echo "None found" + +# Hardcoded organization references +echo "--- Organization References ---" +repo_parent=$(gh repo view --json parent -q '.parent.nameWithOwner' 2>/dev/null) +if [ -n "$repo_parent" ]; then + parent_org=$(echo "$repo_parent" | cut -d'/' -f1) + echo "Searching for hardcoded references to upstream org: $parent_org" + grep -r "$parent_org" .github/workflows/ 2>/dev/null || echo "None found" +fi +``` + +### Fork Handling Options + +For each detected pattern, choose appropriate handling: + +| Option | When to Use | Implementation | +|--------|-------------|----------------| +| **Disable** | Deploy workflows, upstream-specific CI | `mv .github/workflows/deploy.yml .github/workflows/deploy.yml.disabled` | +| **Adapt** | Can be modified for your fork | Edit workflow to use your org/secrets | +| **Remove** | Not needed in fork | `rm .github/workflows/upstream-specific.yml` | +| **Keep** | Works as-is (rare) | No changes needed | + +**Quick disable script:** +```bash +#!/bin/bash +# disable-upstream-workflows.sh - Disable problematic workflows + +# Disable deploy workflows (most common) +for workflow in deploy publish release; do + for file in .github/workflows/${workflow}*.yml; do + if [ -f "$file" ]; then + echo "Disabling: $file" + mv "$file" "${file}.disabled" + fi + done +done + +# Find and suggest other workflows to disable +echo "=== Suggested Workflows to Review ===" +grep -l "external_repository\|DEPLOY_KEY\|ORG_" .github/workflows/*.yml 2>/dev/null | \ + while read file; do + echo "⚠️ Review: $file (contains upstream patterns)" + done +``` + +### Complexity Assessment + +**Complexity tiers guide decision making:** + +| Tier | Workflows | Lines | Approach | +|------|-----------|-------|----------| +| Simple | 1-5 | <500 | Fix all in one PR | +| Medium | 6-10 | 500-1500 | Fix by priority, 1-2 PRs | +| Complex | 11+ | 1500+ | Incremental fixes, multiple PRs | +| Massive | 15+ | 3000+ | Consider disable-first strategy | + +**For High/Massive complexity in forks:** + +1. Start with disabling non-essential workflows (quick win) +2. Focus on basic fixes (concurrency, path filters) for essential workflows +3. Address failures incrementally +4. Document known limitations that won't be fixed + +**Assessment script:** +```bash +#!/bin/bash +# assess-complexity.sh - Determine review approach + +workflows=$(ls -1 .github/workflows/*.yml 2>/dev/null | wc -l) +lines=$(wc -l .github/workflows/*.yml 2>/dev/null | tail -1 | awk '{print $1}' || echo 0) + +echo "Workflows: $workflows" +echo "Total lines: $lines" + +if [ "$workflows" -le 5 ] && [ "$lines" -le 500 ]; then + echo "Complexity: SIMPLE - Fix all in one PR" +elif [ "$workflows" -le 10 ] && [ "$lines" -le 1500 ]; then + echo "Complexity: MEDIUM - Fix by priority, 1-2 PRs" +elif [ "$workflows" -le 15 ] && [ "$lines" -le 3000 ]; then + echo "Complexity: COMPLEX - Incremental fixes, multiple PRs" +else + echo "Complexity: MASSIVE - Consider disable-first strategy" +fi + +# Check if it's a fork +if gh repo view --json isFork -q '.isFork' 2>/dev/null | grep -q true; then + echo "⚠️ FORK DETECTED - Apply fork-specific patterns first" +fi +``` ## Batch Commands diff --git a/components/skills/cicd-github-actions-ops/references/performance.md b/components/skills/cicd-github-actions-ops/references/performance.md new file mode 100644 index 00000000..8cd9156b --- /dev/null +++ b/components/skills/cicd-github-actions-ops/references/performance.md @@ -0,0 +1,584 @@ +# Performance Optimization Reference + +Comprehensive strategies for optimizing GitHub Actions workflow performance and reducing build times. + +## Overview + +Performance optimization focuses on reducing workflow execution time, resource usage, and runner costs while maintaining reliability. + +## Performance Analysis Framework + +### Baseline Measurement + +```bash +#!/bin/bash +# measure-performance.sh - Establish performance baseline + +repo=${1:-$(gh repo view --json nameWithOwner -q .nameWithOwner)} + +echo "=== Performance Baseline: $repo ===" + +# Average build times by workflow +gh run list --repo "$repo" --limit 100 --json workflowName,runStartedAt,updatedAt | \ + jq -r '.[] | + select(.runStartedAt != null and .updatedAt != null) | + [.workflowName, + ((.updatedAt | fromdateiso8601) - (.runStartedAt | fromdateiso8601))] | + @csv' | \ + awk -F',' ' + { + workflow=substr($1, 2, length($1)-2) # Remove quotes + time=$2 + sum[workflow] += time + count[workflow]++ + } + END { + for (w in sum) { + avg = sum[w] / count[w] + printf "%-30s %3dm %2ds (%d runs)\n", w, int(avg/60), int(avg%60), count[w] + } + }' | sort -k2,2nr +``` + +### Performance Metrics + +| Metric | Target | Warning | Critical | +|--------|--------|---------|----------| +| Total workflow time | <15 min | >30 min | >60 min | +| Queue time | <30 sec | >2 min | >5 min | +| Setup time (checkout, cache) | <1 min | >3 min | >5 min | +| Test execution | <10 min | >20 min | >45 min | +| Artifact upload/download | <1 min | >5 min | >10 min | + +## Caching Strategies + +### Language-Specific Caching + +#### Rust/Cargo + +```yaml +- name: Cache Cargo dependencies + uses: actions/cache@v4.1.2 + with: + path: | + ~/.cargo/bin/ + ~/.cargo/registry/index/ + ~/.cargo/registry/cache/ + ~/.cargo/git/db/ + target/ + key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }} + restore-keys: | + ${{ runner.os }}-cargo- +``` + +#### Node.js/npm + +```yaml +- name: Cache Node dependencies + uses: actions/cache@v4.1.2 + with: + path: ~/.npm + key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }} + restore-keys: | + ${{ runner.os }}-node- + +- name: Cache Next.js build + uses: actions/cache@v4.1.2 + with: + path: | + ~/.npm + ${{ github.workspace }}/.next/cache + key: ${{ runner.os }}-nextjs-${{ hashFiles('**/package-lock.json') }}-${{ hashFiles('**.[jt]s', '**.[jt]sx') }} +``` + +#### Python/pip + +```yaml +- name: Cache pip dependencies + uses: actions/cache@v4.1.2 + with: + path: ~/.cache/pip + key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }} + restore-keys: | + ${{ runner.os }}-pip- +``` + +#### Go + +```yaml +- name: Cache Go modules + uses: actions/cache@v4.1.2 + with: + path: | + ~/go/pkg/mod + ~/.cache/go-build + key: ${{ runner.os }}-go-${{ hashFiles('**/go.sum') }} + restore-keys: | + ${{ runner.os }}-go- +``` + +### Advanced Caching Patterns + +#### Multi-Layer Caching + +```yaml +# Cache both dependencies and build artifacts with fallbacks +- name: Cache dependencies + uses: actions/cache@v4.1.2 + with: + path: ~/.cargo + key: ${{ runner.os }}-deps-${{ hashFiles('**/Cargo.lock') }} + restore-keys: | + ${{ runner.os }}-deps- + ${{ runner.os }}- + +- name: Cache build artifacts + uses: actions/cache@v4.1.2 + with: + path: target/ + key: ${{ runner.os }}-build-${{ github.sha }} + restore-keys: | + ${{ runner.os }}-build-${{ github.head_ref }}- + ${{ runner.os }}-build-main- + ${{ runner.os }}-build- +``` + +#### Conditional Caching + +```yaml +- name: Cache when not dependabot + if: github.actor != 'dependabot[bot]' + uses: actions/cache@v4.1.2 + with: + path: ~/.cargo + key: cargo-${{ hashFiles('Cargo.lock') }} +``` + +## Parallelization Strategies + +### Job-Level Parallelization + +```yaml +jobs: + lint: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Run linter + run: cargo clippy + + test: + runs-on: ubuntu-latest + strategy: + matrix: + include: + - name: "unit-tests" + command: "cargo test --lib" + - name: "integration-tests" + command: "cargo test --test '*'" + - name: "doc-tests" + command: "cargo test --doc" + steps: + - uses: actions/checkout@v4 + - name: Run tests + run: ${{ matrix.command }} + + build: + needs: [lint, test] # Only run after lint/test pass + runs-on: ubuntu-latest + steps: + - name: Build release + run: cargo build --release +``` + +### Step-Level Parallelization + +```yaml +- name: Run parallel tasks + run: | + # Run independent tasks in background + npm run lint & + npm run type-check & + npm run security-audit & + + # Wait for all to complete + wait + + echo "All parallel tasks completed" +``` + +### Matrix Optimization + +```yaml +strategy: + fail-fast: false # Don't cancel other matrix jobs on first failure + matrix: + os: [ubuntu-latest, windows-latest, macos-latest] + rust-version: [stable, beta] + include: + # Add extra combinations + - os: ubuntu-latest + rust-version: nightly + experimental: true + exclude: + # Remove expensive combinations + - os: windows-latest + rust-version: beta +``` + +## Resource Optimization + +### Runner Selection + +| Use Case | Runner Type | Cost Factor | Performance | +|----------|-------------|-------------|-------------| +| Basic CI (lint, test) | `ubuntu-latest` | 1x | Fast | +| Multi-OS testing | `windows-latest`, `macos-latest` | 2x, 10x | Moderate | +| Large builds | `ubuntu-latest-4-cores` | 2x | Very fast | +| GPU workloads | `gpu-ubuntu-latest` | 50x+ | Specialized | + +```yaml +# Choose appropriate runner size +jobs: + quick-checks: + runs-on: ubuntu-latest # 2-core, sufficient for lint/small tests + + heavy-build: + runs-on: ubuntu-latest-4-cores # 4-core for compilation-heavy workloads + + cross-platform: + strategy: + matrix: + os: [ubuntu-latest, windows-latest] # Skip macOS if not needed +``` + +### Memory and Disk Management + +```yaml +- name: Free up disk space + run: | + # Remove unnecessary tools to free ~14GB + sudo rm -rf /usr/share/dotnet + sudo rm -rf /opt/ghc + sudo rm -rf "/usr/local/share/boost" + sudo rm -rf "$AGENT_TOOLSDIRECTORY" + + df -h + +- name: Optimize memory usage + run: | + # Limit parallel jobs based on available memory + JOBS=$(nproc) + MEM_GB=$(free -g | awk '/^Mem:/{print $2}') + + # Use fewer jobs if memory is limited (< 2GB per job) + if [ $MEM_GB -lt $((JOBS * 2)) ]; then + JOBS=$((MEM_GB / 2)) + JOBS=$([ $JOBS -lt 1 ] && echo 1 || echo $JOBS) + fi + + echo "Using $JOBS parallel jobs" + echo "MAKEFLAGS=-j$JOBS" >> $GITHUB_ENV +``` + +## Build Optimization + +### Incremental Builds + +```yaml +# Rust incremental compilation +- name: Enable incremental builds + run: | + echo "CARGO_INCREMENTAL=1" >> $GITHUB_ENV + echo "CARGO_NET_RETRY=10" >> $GITHUB_ENV + +# Selective building +- name: Build only changed packages + run: | + # Get list of changed files + changed_files=$(git diff --name-only ${{ github.base_ref }}..HEAD) + + if echo "$changed_files" | grep -q "^frontend/"; then + echo "Frontend changed, building frontend" + cd frontend && npm run build + fi + + if echo "$changed_files" | grep -q "^backend/"; then + echo "Backend changed, building backend" + cd backend && cargo build --release + fi +``` + +### Dependency Optimization + +```yaml +# Pin action versions for consistent performance +- uses: actions/checkout@v4.2.2 # Specific version, not @v4 +- uses: actions/cache@v4.1.2 + +# Use faster alternatives +- name: Setup Node + uses: actions/setup-node@v4.1.0 + with: + node-version: '20' + cache: 'npm' # Built-in caching, faster than separate cache action + +# Minimize checkout scope +- uses: actions/checkout@v4 + with: + fetch-depth: 1 # Shallow clone, much faster + sparse-checkout: | + src/ + Cargo.toml + Cargo.lock +``` + +## Network Optimization + +### Artifact Management + +```yaml +# Upload only essential artifacts +- name: Upload test results + if: always() + uses: actions/upload-artifact@v4 + with: + name: test-results + path: | + target/debug/test-results.xml + !target/debug/**/*.o # Exclude large object files + retention-days: 7 # Shorter retention for non-critical artifacts + +# Compress large artifacts +- name: Compress logs + run: tar -czf logs.tar.gz logs/ + +- name: Upload compressed logs + uses: actions/upload-artifact@v4 + with: + name: logs + path: logs.tar.gz +``` + +### Registry Optimization + +```yaml +# Use registry mirrors for faster downloads +- name: Configure npm registry + run: npm config set registry https://registry.npmjs.org/ + +# Docker layer caching +- name: Build Docker image + uses: docker/build-push-action@v6 + with: + context: . + cache-from: type=gha + cache-to: type=gha,mode=max +``` + +## Performance Testing Workflows + +### Benchmark Integration + +```yaml +name: Performance Benchmarks +on: + push: + branches: [main] + pull_request: + paths: ['src/**', 'benches/**'] + +jobs: + benchmark: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - name: Run benchmarks + run: cargo bench --bench=main + + - name: Performance regression check + run: | + # Compare with baseline (stored in git notes or artifact) + current_time=$(cat target/criterion/main/benchmark.json | jq '.mean.estimate') + baseline_time=$(git notes show HEAD~1:benchmark || echo "0") + + if (( $(echo "$current_time > $baseline_time * 1.1" | bc -l) )); then + echo "❌ Performance regression detected: ${current_time}s vs ${baseline_time}s" + exit 1 + fi + + echo "✅ Performance within acceptable range" + git notes add -m "$current_time" HEAD +``` + +### Load Testing + +```yaml +- name: Performance load test + run: | + # Start application in background + ./target/release/app & + APP_PID=$! + + # Wait for startup + sleep 5 + + # Run load test + ab -n 1000 -c 10 http://localhost:8080/health + + # Cleanup + kill $APP_PID + + # Check if response time is acceptable + avg_time=$(grep "Time per request:" ab_results.txt | head -1 | awk '{print $4}') + if (( $(echo "$avg_time > 100" | bc -l) )); then + echo "❌ Response time too high: ${avg_time}ms" + exit 1 + fi +``` + +## Cost Optimization + +### Runner Cost Analysis + +```bash +#!/bin/bash +# analyze-runner-costs.sh - Estimate workflow costs + +repo=${1:-$(gh repo view --json nameWithOwner -q .nameWithOwner)} + +echo "=== Runner Cost Analysis: $repo ===" + +# Get runner usage by OS +gh run list --repo "$repo" --limit 100 --json runsOn,duration,runStartedAt | \ + jq -r '.[] | + select(.duration != null) | + [.runsOn[0], .duration] | + @csv' | \ + awk -F',' ' + { + os = substr($1, 2, length($1)-2) # Remove quotes + duration = $2 + + # GitHub Actions pricing (rough estimates) + if (os ~ /ubuntu/) cost_per_min = 0.008 + else if (os ~ /windows/) cost_per_min = 0.016 + else if (os ~ /macos/) cost_per_min = 0.08 + else cost_per_min = 0.008 + + total_minutes[os] += duration / 60 + total_cost[os] += (duration / 60) * cost_per_min + runs[os]++ + } + END { + print "OS\t\tMinutes\tCost\tRuns\tAvg/Run" + for (os in total_minutes) { + printf "%-15s\t%.1f\t$%.2f\t%d\t%.1fm\n", + os, total_minutes[os], total_cost[os], runs[os], + total_minutes[os] / runs[os] + } + }' +``` + +### Optimization Recommendations + +```yaml +# Use YAML anchors to reduce duplication +.cache_config: &cache_config + uses: actions/cache@v4.1.2 + with: + path: ~/.cargo + key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }} + +jobs: + test: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - *cache_config # Reuse cache configuration + - run: cargo test + + build: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - *cache_config # Reuse cache configuration + - run: cargo build +``` + +## Monitoring Performance + +### Performance Alerts + +```yaml +# Add to workflow to track performance degradation +- name: Performance monitoring + run: | + build_time=$(date +%s -d "${{ steps.build.outputs.start_time }}") + current_time=$(date +%s) + duration=$((current_time - build_time)) + + # Alert if build takes longer than 20 minutes + if [ $duration -gt 1200 ]; then + echo "⚠️ Build time exceeded 20 minutes: ${duration}s" + echo "performance_alert=true" >> $GITHUB_OUTPUT + fi + id: perf_check + +- name: Create performance alert issue + if: steps.perf_check.outputs.performance_alert + uses: actions/github-script@v7 + with: + script: | + github.rest.issues.create({ + owner: context.repo.owner, + repo: context.repo.repo, + title: 'Performance Alert: Build time exceeded threshold', + body: `Build time exceeded 20 minutes in workflow: ${{ github.workflow }} + + **Run details:** + - Workflow: ${{ github.workflow }} + - Duration: ${process.env.duration}s + - Commit: ${{ github.sha }} + - Runner: ${{ runner.os }} + + Please investigate and optimize the workflow.`, + labels: ['performance', 'ci'] + }); +``` + +### Dashboard Integration + +```bash +# Generate performance report for dashboard +#!/bin/bash +# performance-report.sh + +{ + echo "{" + echo " \"timestamp\": \"$(date -I)\"," + echo " \"repository\": \"$(gh repo view --json nameWithOwner -q .nameWithOwner)\"," + + echo " \"metrics\": {" + # Average build time last 10 runs + avg_time=$(gh run list --limit 10 --json duration | jq -r '[.[].duration] | add / length') + echo " \"avg_build_time\": $avg_time," + + # Success rate + total=$(gh run list --limit 50 --json conclusion | jq length) + success=$(gh run list --limit 50 --status success --json conclusion | jq length) + success_rate=$((success * 100 / total)) + echo " \"success_rate\": $success_rate," + + # Cache hit rate (if available) + echo " \"cache_hit_rate\": 85" + echo " }" + echo "}" +} > performance-metrics.json +``` + +## Cross-References + +- [monitoring.md](monitoring.md) - Set up monitoring for performance metrics +- [debugging.md](debugging.md) - Debug performance-related issues +- [security.md](security.md) - Ensure optimizations don't compromise security \ No newline at end of file diff --git a/components/skills/cicd-github-actions-ops/references/quick-reference.md b/components/skills/cicd-github-actions-ops/references/quick-reference.md new file mode 100644 index 00000000..dd9f666a --- /dev/null +++ b/components/skills/cicd-github-actions-ops/references/quick-reference.md @@ -0,0 +1,295 @@ +# Quick Reference + +Essential commands, patterns, and troubleshooting for GitHub Actions operations. + +## Essential Commands + +### Repository Analysis +```bash +# Check if repo is fork and assess complexity +gh repo view --json isFork,parent -q '{fork: .isFork, parent: .parent.nameWithOwner}' +ls -1 .github/workflows/*.yml 2>/dev/null | wc -l + +# List failed runs +gh run list --status failure --limit 10 + +# Get logs for failed run +gh run view --log-failed + +# Re-run failed jobs only +gh run rerun --failed +``` + +### Workflow Validation +```bash +# Check YAML syntax and common issues +actionlint .github/workflows/*.yml + +# Find deprecated patterns +grep -r "actions-rs/\|set-output\|save-state" .github/workflows/ + +# Find unpinned actions (security risk) +grep -r "uses:.*@v[0-9]" .github/workflows/ +``` + +### Multi-Repository Operations +```bash +# List repos with workflows +for repo in $(gh repo list aRustyDev --limit 100 --json name -q '.[].name'); do + count=$(gh api "repos/aRustyDev/$repo/contents/.github/workflows" 2>/dev/null | jq length 2>/dev/null || echo 0) + [ "$count" -gt 0 ] && echo "$repo: $count workflows" +done + +# Find repos with most failures +for repo in $(gh repo list aRustyDev --limit 50 --json name -q '.[].name'); do + failures=$(gh run list --repo "aRustyDev/$repo" --status failure --limit 100 --json conclusion -q 'length' 2>/dev/null || echo 0) + [ "$failures" -gt 0 ] && echo "$failures $repo" +done | sort -rn +``` + +## Common Patterns + +### Security & Permissions +```yaml +# Minimal permissions template +permissions: + contents: read # Default for all jobs + +jobs: + deploy: + permissions: + contents: read # Clone repo + deployments: write # Deploy + id-token: write # OIDC authentication +``` + +### Performance & Efficiency +```yaml +# Concurrency control +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: ${{ github.ref != 'refs/heads/main' }} + +# Path filters +on: + push: + paths: ['src/**', 'Cargo.toml'] + paths-ignore: ['**.md', 'docs/**'] + +# Caching template +- uses: actions/cache@v4.1.2 + with: + path: ~/.cargo + key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }} + restore-keys: ${{ runner.os }}-cargo- +``` + +### Matrix Optimization +```yaml +strategy: + fail-fast: false + matrix: + os: [ubuntu-latest, windows-latest] + include: + - os: ubuntu-latest + experimental: true + exclude: + - os: windows-latest + experimental: true +``` + +## Quick Fixes + +### Fork Issues +| Issue | Quick Fix | +|-------|-----------| +| External deployment | `mv .github/workflows/deploy.yml .github/workflows/deploy.yml.disabled` | +| Wrong Docker namespace | Edit workflow: `s/upstream-org/your-org/g` | +| Missing secrets | Remove secret references or create tracking issue | +| Upstream composite actions | Replace with standard actions or disable | + +### Performance Issues +| Issue | Quick Fix | +|-------|-----------| +| No concurrency control | Add concurrency block to workflow | +| Broad triggers | Add path filters to `on:` section | +| No caching | Add language-specific cache action | +| Inefficient matrix | Remove unnecessary OS/version combinations | + +### Security Issues +| Issue | Quick Fix | +|-------|-----------| +| `permissions: write-all` | Replace with specific minimal permissions | +| Unpinned actions | Pin to specific version: `@v4.1.0` | +| Exposed secrets | Use environment variables, add masking | +| Injection risks | Validate inputs, use environment variables | + +## Troubleshooting Checklist + +### Pre-Review +- [ ] Check GitHub Status for service issues +- [ ] Identify if repository is a fork +- [ ] Assess workflow complexity (count workflows/lines) +- [ ] Look for recent changes to workflow files + +### During Review +- [ ] **Priority 1**: Ensure workflows are actually working (not just passing) +- [ ] **Priority 2**: Check for reasonable resource usage +- [ ] **Priority 3**: Debug and fix failing workflows +- [ ] **Priority 4**: Choose reliable over fancy actions +- [ ] **Priority 5**: Prefer third-party over self-hosted for new development +- [ ] **Priority 6**: Standardize patterns across repositories + +### Post-Review +- [ ] Validate changes with `actionlint` +- [ ] Verify action versions exist +- [ ] Create tracking issues for action decisions +- [ ] Document known limitations if applicable +- [ ] Test critical workflows after changes + +## Error Patterns + +### Permission Errors +| Error | Cause | Fix | +|-------|-------|-----| +| `Resource not accessible by integration` | Missing permissions | Add `permissions:` block | +| `HttpError: 403` with Dependabot | Dependabot token limitations | Use `pull_request_target` or PAT | + +### Dependency Errors +| Error | Cause | Fix | +|-------|-------|-----| +| `npm ci` fails | Outdated lock file | Delete `package-lock.json`, run `npm install` | +| `cargo build` fails | Missing system dependencies | Add setup steps for dependencies | + +### Environment Errors +| Error | Cause | Fix | +|-------|-------|-----| +| `No space left on device` | Runner disk full | Clean up or use larger runner | +| Action version not found | Invalid version tag | Check action repository for valid versions | +| Timeout | Long-running process | Increase timeout or optimize process | + +## Action Selection Reference + +### Reliable Actions (Prefer These) +| Purpose | Action | Version | Notes | +|---------|--------|---------|-------| +| Checkout | `actions/checkout` | `@v4.1.0` | Standard, well-maintained | +| Cache | `actions/cache` | `@v4.1.2` | Official caching | +| Setup Node | `actions/setup-node` | `@v4.1.0` | Built-in npm caching | +| Upload artifact | `actions/upload-artifact` | `@v4` | Official artifact handling | + +### Language-Specific +| Language | Setup | Cache | Notes | +|----------|-------|-------|-------| +| Rust | `dtolnay/rust-toolchain` | `Swatinem/rust-cache` | Fast, efficient | +| Node.js | `actions/setup-node` | Built-in | Use `cache: 'npm'` | +| Python | `actions/setup-python` | `actions/cache` | Path: `~/.cache/pip` | +| Go | `actions/setup-go` | `actions/cache` | Path: `~/go/pkg/mod` | + +### Security Actions +| Purpose | Action | Notes | +|---------|--------|-------| +| Secret scanning | `trufflesecurity/trufflehog` | Find exposed secrets | +| Dependency audit | Built-in (`npm audit`, `cargo audit`) | Language-specific | +| Container scan | `aquasecurity/trivy-action` | Scan Docker images | + +## File Templates + +### Basic CI Workflow +```yaml +name: CI +on: + push: + branches: [main] + pull_request: + branches: [main] + +concurrency: + group: ${{ github.workflow }}-${{ github.ref }} + cancel-in-progress: ${{ github.ref != 'refs/heads/main' }} + +jobs: + test: + runs-on: ubuntu-latest + permissions: + contents: read + steps: + - uses: actions/checkout@v4.1.0 + - name: Setup + run: echo "Setup steps here" + - name: Test + run: echo "Test commands here" +``` + +### Minimal Dependabot Config +```yaml +# .github/dependabot.yml +version: 2 +updates: + - package-ecosystem: "github-actions" + directory: "/" + schedule: + interval: "monthly" +``` + +### Basic Auto-assign Workflow +```yaml +# .github/workflows/auto-assign.yml +name: Auto Assign +on: + issues: + types: [opened] + pull_request: + types: [opened] + +jobs: + assign: + runs-on: ubuntu-latest + permissions: + issues: write + pull-requests: write + steps: + - uses: actions/github-script@v7 + with: + script: | + github.rest.issues.addAssignees({ + owner: context.repo.owner, + repo: context.repo.repo, + issue_number: context.issue.number, + assignees: ['${{ github.repository_owner }}'] + }); +``` + +## Complexity Decision Matrix + +| Workflows | Lines | Approach | Estimated Time | +|-----------|-------|----------|----------------| +| 1-5 | <500 | Fix all in one PR | 1-2 hours | +| 6-10 | 500-1500 | Fix by priority, 1-2 PRs | 3-6 hours | +| 11-15 | 1500-3000 | Incremental fixes, multiple PRs | 1-2 days | +| 16+ | 3000+ | Disable-first strategy | 2+ days | + +### When to Ask User +- Complete fix requires >2 hours of refactoring +- Fix would change core project behavior +- Multiple equally valid approaches exist +- Fork has diverged significantly from upstream + +## Resource Links + +### Official Documentation +- [GitHub Actions Documentation](https://docs.github.com/en/actions) +- [Workflow syntax](https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions) +- [Security hardening](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions) + +### Tools +- [actionlint](https://github.com/rhymond/actionlint) - Workflow linting +- [act](https://github.com/nektos/act) - Local workflow testing +- [GitHub CLI](https://cli.github.com/) - Command-line operations + +### Cross-References +- [debugging.md](debugging.md) - Detailed debugging procedures +- [security.md](security.md) - Comprehensive security hardening +- [performance.md](performance.md) - Performance optimization strategies +- [monitoring.md](monitoring.md) - Monitoring and alerting setup +- [multi-repo.md](multi-repo.md) - Multi-repository review workflows \ No newline at end of file diff --git a/components/skills/cicd-github-actions-ops/references/security.md b/components/skills/cicd-github-actions-ops/references/security.md new file mode 100644 index 00000000..6256ca32 --- /dev/null +++ b/components/skills/cicd-github-actions-ops/references/security.md @@ -0,0 +1,637 @@ +# Security Reference + +Comprehensive security hardening guide for GitHub Actions workflows in production environments. + +## Overview + +Security in GitHub Actions involves protecting secrets, preventing code injection, controlling permissions, and maintaining supply chain integrity. + +## Permission Management + +### Minimal Permission Principle + +Always use the least permissions required for each job: + +```yaml +# Global default (restrictive) +permissions: + contents: read + +jobs: + test: + permissions: + contents: read # Clone repo + steps: + - uses: actions/checkout@v4 + - run: npm test + + deploy: + permissions: + contents: read # Clone repo + deployments: write # Create deployment + id-token: write # OIDC token + steps: + - name: Deploy + run: echo "Deploying..." +``` + +### Permission Reference Matrix + +| Permission | Read | Write | Purpose | +|------------|------|-------|---------| +| `actions` | List workflow runs | Cancel, re-run workflows | +| `checks` | View check runs | Create check runs/suites | +| `contents` | Clone repository | Push commits, create releases | +| `deployments` | View deployments | Create deployments | +| `discussions` | View discussions | Create, edit discussions | +| `id-token` | - | Request OIDC token | +| `issues` | View issues | Create, edit issues | +| `packages` | Download packages | Publish packages | +| `pages` | View Pages | Deploy to Pages | +| `pull-requests` | View PRs | Create, edit PRs | +| `security-events` | View security alerts | Dismiss security alerts | +| `statuses` | View commit statuses | Create commit statuses | + +### Dynamic Permissions + +```yaml +jobs: + conditional-deploy: + # Only grant deploy permissions on main branch + permissions: + contents: read + deployments: ${{ github.ref == 'refs/heads/main' && 'write' || 'none' }} + id-token: ${{ github.ref == 'refs/heads/main' && 'write' || 'none' }} + + steps: + - name: Deploy production + if: github.ref == 'refs/heads/main' + run: echo "Deploying to production" +``` + +## Secret Management + +### Secret Security Best Practices + +```yaml +# ✅ GOOD: Proper secret usage +- name: Deploy with secrets + env: + API_TOKEN: ${{ secrets.API_TOKEN }} + run: | + # Mask sensitive output + echo "::add-mask::$API_TOKEN" + deploy.sh --token "$API_TOKEN" + +# ❌ BAD: Secret exposure risks +- name: Unsafe secret usage + run: | + # DON'T: Log secrets + echo "Token: ${{ secrets.API_TOKEN }}" + + # DON'T: Put secrets in artifacts + echo "${{ secrets.API_TOKEN }}" > token.txt + + # DON'T: Use secrets in URLs + curl "https://api.example.com/?token=${{ secrets.API_TOKEN }}" +``` + +### Secret Validation + +```yaml +- name: Validate required secrets + run: | + # Check if secrets are available without exposing them + if [ -z "${{ secrets.API_TOKEN }}" ]; then + echo "❌ API_TOKEN secret is not set" + exit 1 + fi + + if [ -z "${{ secrets.DEPLOY_KEY }}" ]; then + echo "❌ DEPLOY_KEY secret is not set" + exit 1 + fi + + echo "✅ All required secrets are available" +``` + +### Environment-Specific Secrets + +```yaml +jobs: + deploy: + environment: + name: ${{ github.ref == 'refs/heads/main' && 'production' || 'staging' }} + steps: + - name: Deploy + env: + # Environment-specific secrets + API_URL: ${{ vars.API_URL }} + API_TOKEN: ${{ secrets.API_TOKEN }} + run: | + echo "Deploying to: $API_URL" + deploy.sh --url "$API_URL" --token "$API_TOKEN" +``` + +## Input Validation & Injection Prevention + +### GitHub Context Injection + +```yaml +# ❌ DANGEROUS: Unvalidated input injection +- name: Unsafe input handling + run: | + # This allows code injection via issue title, PR title, etc. + echo "Processing: ${{ github.event.issue.title }}" + + # This is vulnerable to script injection + bash -c "echo 'Title: ${{ github.event.pull_request.title }}'" + +# ✅ SAFE: Proper input validation +- name: Safe input handling + env: + ISSUE_TITLE: ${{ github.event.issue.title }} + PR_TITLE: ${{ github.event.pull_request.title }} + run: | + # Use environment variables instead of direct interpolation + echo "Processing: $ISSUE_TITLE" + + # Validate input format before use + if [[ "$ISSUE_TITLE" =~ ^[a-zA-Z0-9[:space:][:punct:]]*$ ]]; then + echo "Title is safe: $ISSUE_TITLE" + else + echo "⚠️ Title contains potentially unsafe characters" + exit 1 + fi +``` + +### User Input Sanitization + +```yaml +- name: Sanitize user inputs + uses: actions/github-script@v7 + with: + script: | + // Safe way to handle user input + const title = context.payload.issue?.title || ''; + const sanitized = title + .replace(/[<>]/g, '') // Remove potential HTML + .substring(0, 100); // Limit length + + console.log(`Processing: ${sanitized}`); + + // Validate input format + if (!/^[a-zA-Z0-9\s\-_.]+$/.test(sanitized)) { + core.setFailed('Invalid characters in title'); + return; + } +``` + +### Safe Command Construction + +```yaml +- name: Build safe commands + run: | + # Use arrays and proper quoting + BRANCH="${{ github.head_ref }}" + + # Validate branch name format + if [[ ! "$BRANCH" =~ ^[a-zA-Z0-9/_-]+$ ]]; then + echo "❌ Invalid branch name format" + exit 1 + fi + + # Safe command construction + git checkout "$BRANCH" + + # For complex commands, use JSON or YAML + jq -n \ + --arg branch "$BRANCH" \ + --arg commit "${{ github.sha }}" \ + '{branch: $branch, commit: $commit}' > deploy.json +``` + +## Supply Chain Security + +### Action Pinning Strategy + +```yaml +# ✅ BEST: Pin to specific SHA (immutable) +- uses: actions/checkout@8ade135a41bc03ea155e62e844d188df1ea18608 # v4.1.0 + +# ✅ GOOD: Pin to specific version +- uses: actions/checkout@v4.1.0 + +# ⚠️ ACCEPTABLE: Pin to major version (gets patch updates) +- uses: actions/checkout@v4 + +# ❌ DANGEROUS: Use branch or tag that can change +- uses: actions/checkout@main +- uses: actions/checkout@latest +``` + +### Action Security Audit + +```bash +#!/bin/bash +# audit-actions.sh - Security audit for GitHub Actions + +echo "=== GitHub Actions Security Audit ===" + +# Find all action references +actions=$(grep -rh "uses:" .github/workflows/ | grep -oE '[^/]+/[^@]+@[^[:space:]]+' | sort -u) + +for action in $actions; do + repo=$(echo "$action" | cut -d@ -f1) + ref=$(echo "$action" | cut -d@ -f2) + + echo "--- $repo ---" + + # Check if pinned to SHA + if [[ "$ref" =~ ^[a-f0-9]{40}$ ]]; then + echo "✅ Pinned to SHA: $ref" + elif [[ "$ref" =~ ^v[0-9]+\.[0-9]+\.[0-9]+$ ]]; then + echo "⚠️ Pinned to version: $ref (consider SHA)" + elif [[ "$ref" =~ ^v[0-9]+$ ]]; then + echo "⚠️ Pinned to major: $ref (gets updates)" + else + echo "❌ Unpinned reference: $ref" + fi + + # Check action reputation (basic) + stars=$(gh api "repos/$repo" 2>/dev/null | jq -r '.stargazers_count // 0') + if [ "$stars" -gt 1000 ]; then + echo "✅ High reputation: $stars stars" + elif [ "$stars" -gt 100 ]; then + echo "⚠️ Medium reputation: $stars stars" + else + echo "⚠️ Low reputation: $stars stars" + fi + + # Check for archived status + archived=$(gh api "repos/$repo" 2>/dev/null | jq -r '.archived // false') + if [ "$archived" = "true" ]; then + echo "❌ Repository is archived" + fi + + echo "" +done +``` + +### Dependency Verification + +```yaml +- name: Verify action integrity + run: | + # Create checksums for critical actions + cat > action-checksums.txt << 'EOF' + actions/checkout@v4.1.0 8ade135a41bc03ea155e62e844d188df1ea18608 + actions/cache@v4.1.2 ab5e6d0c87105b4c9c2047343972218f562e4319 + EOF + + # Verify current usage matches expected checksums + while read action expected_sha; do + current_sha=$(grep "$action" .github/workflows/*.yml | grep -o '[a-f0-9]\{40\}' || echo "not-found") + if [ "$current_sha" = "$expected_sha" ]; then + echo "✅ $action: SHA verified" + else + echo "❌ $action: SHA mismatch (current: $current_sha, expected: $expected_sha)" + exit 1 + fi + done < action-checksums.txt +``` + +### Private Action Security + +```yaml +# For internal/private actions +- name: Use internal action securely + uses: ./.github/actions/deploy + with: + environment: ${{ inputs.environment }} + env: + # Limit environment variable exposure + DEPLOY_TOKEN: ${{ secrets.DEPLOY_TOKEN }} +``` + +## Network Security + +### HTTPS Enforcement + +```yaml +- name: Secure external requests + run: | + # ✅ GOOD: Use HTTPS + curl -fsSL "https://api.github.com/repos/owner/repo" + + # ✅ GOOD: Verify certificates + curl --cacert /etc/ssl/certs/ca-certificates.crt "https://api.example.com" + + # ❌ BAD: Insecure connections + # curl "http://api.example.com" # Unencrypted + # curl -k "https://api.example.com" # Skip certificate verification +``` + +### API Token Security + +```yaml +- name: Secure API interactions + env: + GITHUB_TOKEN: ${{ github.token }} + run: | + # Use built-in token when possible (auto-expires) + gh api repos/${{ github.repository }}/issues + + # For external APIs, use secure token handling + curl -H "Authorization: Bearer $EXTERNAL_TOKEN" \ + -H "User-Agent: GitHub-Actions" \ + "https://api.external-service.com/data" + env: + EXTERNAL_TOKEN: ${{ secrets.EXTERNAL_API_TOKEN }} +``` + +## OIDC (OpenID Connect) Authentication + +### AWS OIDC Setup + +```yaml +jobs: + deploy-aws: + runs-on: ubuntu-latest + permissions: + id-token: write + contents: read + + steps: + - name: Configure AWS credentials + uses: aws-actions/configure-aws-credentials@v4 + with: + role-to-assume: arn:aws:iam::123456789012:role/GitHubActions + role-session-name: GitHubActionsSession + aws-region: us-east-1 + # No long-lived AWS keys needed! + + - name: Deploy to AWS + run: aws s3 sync ./dist s3://my-bucket/ +``` + +### Azure OIDC Setup + +```yaml +jobs: + deploy-azure: + runs-on: ubuntu-latest + permissions: + id-token: write + contents: read + + steps: + - name: Azure Login + uses: azure/login@v1 + with: + client-id: ${{ secrets.AZURE_CLIENT_ID }} + tenant-id: ${{ secrets.AZURE_TENANT_ID }} + subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }} + + - name: Deploy to Azure + run: az webapp deploy --name myapp --resource-group mygroup +``` + +### Custom OIDC Claims + +```yaml +- name: Debug OIDC token claims + run: | + # Decode JWT token (for debugging only) + TOKEN=$(curl -H "Authorization: bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \ + "$ACTIONS_ID_TOKEN_REQUEST_URL&audience=custom-audience" | jq -r .value) + + # The token contains claims like: + # - iss: https://token.actions.githubusercontent.com + # - aud: custom-audience + # - repository: owner/repo + # - ref: refs/heads/main + # - sha: commit-sha + echo "OIDC token retrieved for audience: custom-audience" +``` + +## Environment Isolation + +### Environment Protection Rules + +```yaml +# In repository settings, configure environment protection: +# 1. Required reviewers +# 2. Wait timer +# 3. Deployment branches (only main/release branches) +environment: + name: production + url: https://myapp.production.com + +# Environment-specific variables and secrets are automatically applied +``` + +### Runtime Environment Security + +```yaml +- name: Secure runtime environment + run: | + # Remove sensitive environment variables after use + unset AWS_SECRET_ACCESS_KEY + unset DATABASE_PASSWORD + + # Clear bash history + history -c + + # Remove temporary files + find /tmp -name "*secret*" -delete 2>/dev/null || true +``` + +## Security Monitoring + +### Audit Logging + +```yaml +- name: Security audit log + run: | + # Log security-relevant actions + echo "SECURITY_AUDIT: $(date -Iseconds) - User: ${{ github.actor }}" >> security.log + echo "SECURITY_AUDIT: Repository: ${{ github.repository }}" >> security.log + echo "SECURITY_AUDIT: Event: ${{ github.event_name }}" >> security.log + echo "SECURITY_AUDIT: Ref: ${{ github.ref }}" >> security.log + + # Log permission usage + if [ -n "${{ secrets.DEPLOY_TOKEN }}" ]; then + echo "SECURITY_AUDIT: DEPLOY_TOKEN accessed" >> security.log + fi + +- name: Upload security logs + if: always() + uses: actions/upload-artifact@v4 + with: + name: security-logs + path: security.log + retention-days: 90 # Keep for compliance +``` + +### Vulnerability Scanning + +```yaml +- name: Scan for secrets + uses: trufflesecurity/trufflehog@main + with: + path: ./ + base: main + head: HEAD + +- name: Dependency vulnerability scan + run: | + # For Node.js + npm audit --audit-level=high + + # For Rust + cargo audit + + # For Python + pip-audit + + # For container images + docker run --rm -v "$PWD":/app aquasecurity/trivy fs /app +``` + +### Security Alerts Integration + +```yaml +- name: Security alert on failure + if: failure() + uses: actions/github-script@v7 + with: + script: | + github.rest.issues.create({ + owner: context.repo.owner, + repo: context.repo.repo, + title: '🚨 Security workflow failure', + body: ` + **Security Alert** + + Workflow: ${{ github.workflow }} + Run: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }} + Actor: ${{ github.actor }} + Event: ${{ github.event_name }} + + A security-related workflow has failed. Please investigate immediately. + `, + labels: ['security', 'urgent', 'incident'] + }); +``` + +## Security Checklist + +### Pre-Deployment Security Review + +- [ ] All actions pinned to specific versions or SHAs +- [ ] No hardcoded secrets in workflow files +- [ ] Minimal permissions granted to each job +- [ ] User inputs properly validated and sanitized +- [ ] External network calls use HTTPS with certificate verification +- [ ] OIDC used instead of long-lived credentials where possible +- [ ] Environment protection rules configured for production +- [ ] Security scanning integrated into CI pipeline +- [ ] Audit logging enabled for security-relevant actions + +### Regular Security Maintenance + +- [ ] Review and update pinned action versions quarterly +- [ ] Audit repository secrets and remove unused ones +- [ ] Review permission grants for over-privileged workflows +- [ ] Update OIDC configurations when infrastructure changes +- [ ] Review security scan results and address findings +- [ ] Monitor for deprecated actions and security advisories +- [ ] Test security controls with simulated attack scenarios + +## Incident Response + +### Security Incident Workflow + +```yaml +# .github/workflows/security-incident.yml +name: Security Incident Response +on: + workflow_dispatch: + inputs: + incident_type: + description: 'Type of security incident' + required: true + type: choice + options: + - 'compromised-token' + - 'malicious-action' + - 'data-exposure' + - 'unauthorized-access' + + description: + description: 'Brief description of the incident' + required: true + +jobs: + incident-response: + runs-on: ubuntu-latest + environment: security-team + + steps: + - name: Disable affected workflows + if: inputs.incident_type == 'compromised-token' + run: | + # Disable all workflows using potentially compromised secrets + find .github/workflows -name "*.yml" -exec \ + gh workflow disable {} \; + + - name: Revoke tokens + if: inputs.incident_type == 'compromised-token' + run: | + # Instructions for manual token revocation + echo "⚠️ MANUAL ACTION REQUIRED:" + echo "1. Revoke all repository secrets in GitHub settings" + echo "2. Generate new tokens/keys" + echo "3. Update secrets with new values" + echo "4. Re-enable workflows" + + - name: Create incident issue + uses: actions/github-script@v7 + with: + script: | + const issue = await github.rest.issues.create({ + owner: context.repo.owner, + repo: context.repo.repo, + title: `🚨 SECURITY INCIDENT: ${{ inputs.incident_type }}`, + body: ` + **Incident Type:** ${{ inputs.incident_type }} + **Reporter:** ${{ github.actor }} + **Time:** ${{ github.event.head_commit.timestamp }} + + **Description:** + ${{ inputs.description }} + + **Immediate Actions Taken:** + - [ ] Workflows disabled (if applicable) + - [ ] Tokens revoked (if applicable) + - [ ] Security team notified + + **Investigation Tasks:** + - [ ] Determine scope of compromise + - [ ] Identify affected systems + - [ ] Review audit logs + - [ ] Document lessons learned + `, + labels: ['security', 'incident', 'P0'], + assignees: ['security-team-lead'] + }); + + console.log(`Created incident issue: ${issue.data.html_url}`); +``` + +## Cross-References + +- [debugging.md](debugging.md) - Debug security-related failures (permissions, tokens) +- [monitoring.md](monitoring.md) - Monitor security events and alerts +- [performance.md](performance.md) - Ensure security controls don't impact performance unnecessarily \ No newline at end of file