Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
256 changes: 72 additions & 184 deletions components/skills/cicd-github-actions-ops/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,218 +157,101 @@ Use consistent patterns across all repositories.

## Systematic Review Workflow

### Phase 0: Fork Detection
### Quick Assessment

Before reviewing, check if the repository is a fork:
Before detailed review, assess complexity and identify fork-specific issues:

```bash
# Check if repo is a fork
# Check if repo is a fork and assess complexity
gh repo view --json isFork,parent -q '{fork: .isFork, parent: .parent.nameWithOwner}'
echo "=== Workflow Count ===" && ls -1 .github/workflows/*.yml 2>/dev/null | wc -l
```

**If forked, identify upstream-specific patterns:**
**Workflow complexity tiers:**

| Pattern | Detection | Common Issues |
|---------|-----------|---------------|
| External deploy target | `external_repository:` in workflow | Deploys to upstream's gh-pages |
| Deploy keys | `secrets.DEPLOY_KEY` | Secret doesn't exist in fork |
| Hardcoded org | `google/timesketch` in workflow | Wrong target org |
| Upstream branches | `branches: [main]` when fork uses `master` | Branch mismatch |
| Upstream composite actions | `uses: <upstream>/.github/actions/` | Action path doesn't exist in fork |
| Hardcoded Docker namespace | `docker.*<upstream-org>/` | Pushes to wrong Docker Hub namespace |
| External registries | `hub.infinyon.cloud` or similar | Upstream-specific package registry |
| Upstream secrets | `secrets.ORG_*` or `secrets.DOCKER_*` | Organization secrets not available |
| Tier | Workflows | Approach |
|------|-----------|----------|
| Simple | 1-5 | Fix all in one PR |
| Medium | 6-10 | Fix by priority, 1-2 PRs |
| Complex | 11+ | Incremental fixes, multiple PRs |

```bash
# Comprehensive fork detection
grep -rE "external_repository:|DEPLOY_KEY|\.github/actions/" .github/workflows/
grep -rE "secrets\.(ORG_|DOCKER_|SLACK_|AWS_)" .github/workflows/
grep -rE "https?://[a-z-]+\.[a-z]+\.(cloud|io)/" .github/workflows/ | grep -v github
```

**Fork handling options:**

1. **Disable** - Rename to `.yml.disabled` (recommended for deploy workflows)
2. **Adapt** - Modify to work with your fork
3. **Remove** - Delete if not needed
4. **Keep** - Leave as-is if it will work (rare)

```bash
# Disable a workflow
mv .github/workflows/deploy.yml .github/workflows/deploy.yml.disabled

# Find upstream-specific patterns
grep -r "external_repository\|DEPLOY_KEY\|google/" .github/workflows/
```

### Phase 0.5: Complexity Assessment

Before diving into fixes, assess the scope of work:

```bash
# Count workflows and total lines
echo "=== Workflow Complexity ==="
ls -1 .github/workflows/*.yml 2>/dev/null | wc -l | xargs echo "Workflow count:"
wc -l .github/workflows/*.yml 2>/dev/null | tail -1 | awk '{print "Total lines:", $1}'

# Count action dependencies
echo "=== Action Dependencies ==="
grep -h "uses:" .github/workflows/*.yml 2>/dev/null | wc -l | xargs echo "Action references:"
grep -h "uses:" .github/workflows/*.yml 2>/dev/null | grep -oE '[^/]+/[^@]+' | sort -u | wc -l | xargs echo "Unique actions:"

# Count job dependencies (complexity indicator)
echo "=== Job Dependencies ==="
grep -c "needs:" .github/workflows/*.yml 2>/dev/null | awk -F: '{sum+=$2} END {print "Total needs: clauses:", sum}'

# Matrix sprawl check
echo "=== Matrix Size ==="
grep -A20 "matrix:" .github/workflows/*.yml 2>/dev/null | grep -E "^\s+-\s" | wc -l | xargs echo "Matrix entries:"
```

**Complexity tiers:**
**Fork-specific patterns to detect:**
- `external_repository:` or `secrets.DEPLOY_KEY` → Upstream deployment
- `secrets.(ORG_|DOCKER_|AWS_)` → Organization secrets not available
- `uses: <upstream>/.github/actions/` → Composite actions missing in fork

| Tier | Workflows | Lines | Approach |
|------|-----------|-------|----------|
| Simple | 1-5 | <500 | Fix all in one PR |
| Medium | 6-10 | 500-1500 | Fix by priority, 1-2 PRs |
| Complex | 11+ | 1500+ | Incremental fixes, multiple PRs |
| Massive | 15+ | 3000+ | Consider disable-first strategy |
See: [multi-repo.md](references/multi-repo.md) for detailed fork handling and complexity assessment.

**If complexity is High/Massive:**
### Review Process

1. Start with disabling non-essential workflows
2. Focus on Priority 2 fixes (concurrency, path filters) first
3. Address failures incrementally
4. Document known limitations that won't be fixed
1. **Categorize Issues**: Working vs. reasonable vs. passing vs. efficient
2. **Fix by Priority**: Follow Priority 1-6 order (see above)
3. **Track Decisions**: Create tracking issues for action choices
4. **Validate Changes**: Use `actionlint` and version checks
5. **Document Limitations**: When partial fixes are acceptable

### Phase 1: Gather Information
See: [debugging.md](references/debugging.md) for detailed systematic debugging process.

```bash
# List all open PRs across your repos
gh search prs --author aRustyDev --state open --limit 100
## Security & Hardening

# List failed workflow runs
gh run list --repo <owner>/<repo> --status failure --limit 20
Essential security practices for production workflows:

# Get workflow files for a repo
gh api repos/<owner>/<repo>/contents/.github/workflows | jq -r '.[].name'
### Token Permissions
```yaml
permissions:
contents: read # Default, restrict further if possible
pull-requests: write # Only for PR workflows
id-token: write # Only for OIDC authentication
```

### Phase 2: Categorize Issues

For each PR/failure, categorize:

1. **Workflow broken** - Action itself has bugs
2. **Workflow inefficient** - Runs unnecessarily
3. **Test failure** - Code issue, not workflow
4. **Permission issue** - Token/access problems
5. **Environment issue** - Runner/dependency problems
6. **Flaky test** - Intermittent failures

### Phase 3: Fix by Category

| Category | Action |
|----------|--------|
| Workflow broken | Fix workflow, update action versions |
| Workflow inefficient | Add path filters, concurrency |
| Test failure | Fix code, not workflow |
| Permission issue | Adjust permissions block |
| Environment issue | Pin versions, add setup steps |
| Flaky test | Add retry or fix root cause |

### Phase 4: Track Decisions

For every non-trivial decision, create appropriate tracking:
### Common Security Issues

- **Chose reliable over fancy** → Issue in `arustydev/gha`
- **Chose third-party over self-hosted** → Issue in `arustydev/gha`
- **Found bug in action** → Issue in action's repo
- **Need new action** → Issue in `arustydev/gha`

### Phase 5: Validate Before Committing

Before committing workflow changes, validate them:
| Issue | Risk | Fix |
|-------|------|-----|
| `permissions: write-all` | Full repo access | Use minimal permissions |
| Secrets in logs | Credential exposure | Use `::add-mask::` for outputs |
| Untrusted input in scripts | Code injection | Sanitize `${{ github.event.* }}` |
| Third-party actions without pinning | Supply chain attacks | Pin to commit SHA: `@abc123...` |

### Security Validation
```bash
# 1. Check YAML syntax and common issues
actionlint .github/workflows/*.yml
# Check for overprivileged workflows
grep -r "permissions:" .github/workflows/ | grep -E "(write-all|write.*write)"

# 2. Verify action versions exist
for action in $(grep -h "uses:" .github/workflows/*.yml | grep -oE '[^/]+/[^@]+@v[0-9]+' | sort -u); do
repo=$(echo "$action" | cut -d@ -f1)
version=$(echo "$action" | cut -d@ -f2)
echo -n "$action: "
gh api "repos/$repo/git/refs/tags/$version" --silent && echo "OK" || echo "NOT FOUND"
done
# Find unpinned actions (security risk)
grep -r "uses:.*@v[0-9]" .github/workflows/

# 3. Check for deprecated actions
grep -r "actions-rs/\|set-output\|save-state" .github/workflows/ && echo "WARNING: Deprecated patterns found"
# Check for potential injection points
grep -r "github\.event\." .github/workflows/
```

**Common validation failures:**

| Error | Cause | Fix |
|-------|-------|-----|
| `action version not found` | Invalid version (v6 doesn't exist) | Check [action-selection.md](references/action-selection.md) for valid versions |
| `set-output is deprecated` | Old output syntax | Use `echo "name=value" >> $GITHUB_OUTPUT` |
| `save-state is deprecated` | Old state syntax | Use `echo "name=value" >> $GITHUB_STATE` |

### Phase 6: Partial Fixes and Known Limitations

Not every issue can or should be fully fixed. Know when to stop.

**When to accept a partial fix:**

| Situation | Action |
|-----------|--------|
| Fixing requires rewriting >50% of workflow | Disable or document limitation |
| Need to create custom actions for fork | Document as future work |
| External service dependencies can't be removed | Disable affected jobs/workflows |
| Upstream architecture tightly coupled | Accept reduced CI coverage |

**Documenting known limitations:**

When creating a PR with partial fixes, include a "Known Limitations" section:
See: [security.md](references/security.md) for comprehensive security hardening guide.

```markdown
### Known Limitations
## Performance Optimization

The following issues remain after this fix:
Key areas for workflow performance improvement:

| Issue | Reason | Impact |
|-------|--------|--------|
| `cli_smoke` job fails | Uses upstream's Infinyon Hub | Integration tests don't run |
| Docker builds use wrong namespace | Would require forking build scripts | Images not pushed |

These would require significant refactoring to address.
### Concurrency Control
```yaml
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: ${{ github.ref != 'refs/heads/main' }}
```

**When to ask the user:**

If any of these apply, use AskUserQuestion before proceeding:
### Efficient Triggers
- Use `paths:` filters to avoid unnecessary runs
- Prefer `pull_request` over `push` for PR validation
- Use `workflow_dispatch` for manual triggers instead of broad automation

- Complete fix requires >2 hours of refactoring
- Fix would change core project behavior
- Multiple equally valid approaches exist
- Fork has diverged significantly from upstream

**Incremental progress strategy:**

For complex repositories, prefer multiple small PRs:

```
PR 1: Disable non-essential workflows (quick win)
PR 2: Add concurrency blocks to remaining workflows
PR 3: Fix path filters and triggers
PR 4: Address specific test failures
(Optional) PR 5: Deep refactoring if needed
### Caching Strategies
```yaml
- uses: actions/cache@v4.1.2
with:
path: ~/.cargo
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
```

Each PR should be independently mergeable and improve the situation.
See: [performance.md](references/performance.md) for optimization strategies and monitoring.

## Quick Commands

Expand Down Expand Up @@ -407,7 +290,12 @@ done

## See Also

- Reference: [debugging.md](references/debugging.md) - Detailed debugging guide
- Reference: [action-selection.md](references/action-selection.md) - Action selection criteria
- Reference: [issue-templates.md](references/issue-templates.md) - Issue templates for tracking
- Reference: [multi-repo.md](references/multi-repo.md) - Multi-repository batch review
- **Quick Reference**: [quick-reference.md](references/quick-reference.md) - Essential commands and patterns
- **Deep Dive References**:
- [debugging.md](references/debugging.md) - Systematic debugging procedures
- [security.md](references/security.md) - Security hardening and best practices
- [performance.md](references/performance.md) - Performance optimization strategies
- [monitoring.md](references/monitoring.md) - Monitoring and alerting setup
- [multi-repo.md](references/multi-repo.md) - Multi-repository review workflows
- [action-selection.md](references/action-selection.md) - Action selection criteria
- [issue-templates.md](references/issue-templates.md) - Issue templates for tracking
Loading