-
Notifications
You must be signed in to change notification settings - Fork 1.9k
ci: add banned-files check to block secrets, binaries, and OS artifacts #674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,100 @@ | ||||||
| # SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||||||
| # SPDX-License-Identifier: Apache-2.0 | ||||||
|
|
||||||
| name: banned-files | ||||||
|
|
||||||
| on: | ||||||
| pull_request: | ||||||
| types: [opened, synchronize, reopened] | ||||||
|
|
||||||
| permissions: | ||||||
| contents: read | ||||||
|
|
||||||
| jobs: | ||||||
| check: | ||||||
| runs-on: ubuntu-latest | ||||||
| timeout-minutes: 2 | ||||||
| steps: | ||||||
| - name: Checkout | ||||||
| uses: actions/checkout@v4 | ||||||
| with: | ||||||
| fetch-depth: 0 | ||||||
|
|
||||||
| - name: Check for banned files | ||||||
| run: | | ||||||
| # Banned file patterns and reasons. | ||||||
| # Add new entries here — one pattern per line, tab-separated. | ||||||
| BANNED_PATTERNS=( | ||||||
| '.env' 'May contain API keys or secrets' | ||||||
| '.env.*' 'May contain API keys or secrets' | ||||||
| '.envrc' 'direnv secrets file' | ||||||
| '*.pem' 'Private key or certificate' | ||||||
| '*.key' 'Private key' | ||||||
| '*.p12' 'PKCS#12 keystore' | ||||||
| '*.pfx' 'PKCS#12 keystore' | ||||||
| '*.jks' 'Java keystore' | ||||||
| '*.keystore' 'Keystore file' | ||||||
| 'credentials.json' 'Cloud provider credentials' | ||||||
| 'service-account*.json' 'Cloud provider service account' | ||||||
| '.npmrc' 'May contain registry tokens' | ||||||
| '.netrc' 'May contain login credentials' | ||||||
| '*_rsa' 'SSH private key' | ||||||
| '*_ed25519' 'SSH private key' | ||||||
| '*_ecdsa' 'SSH private key' | ||||||
| '.DS_Store' 'macOS Finder metadata' | ||||||
| 'Thumbs.db' 'Windows Explorer metadata' | ||||||
| 'desktop.ini' 'Windows Explorer metadata' | ||||||
| '*.pyc' 'Python bytecode' | ||||||
| ) | ||||||
|
|
||||||
| # Paths exempt from the check (test fixtures, etc.) | ||||||
| EXEMPT_DIRS="^(test/fixtures/|testdata/)" | ||||||
|
|
||||||
| # Get files changed in this PR | ||||||
| CHANGED=$(git diff --name-only --diff-filter=ACR "${{ github.event.pull_request.base.sha }}" HEAD) | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Verify current diff-filter usage in workflow
rg -n --type=yaml 'git diff --name-only --diff-filter=' .github/workflows/banned-files.yamlRepository: NVIDIA/NemoClaw Length of output: 170 🏁 Script executed: cat -n .github/workflows/banned-files.yaml | head -80Repository: NVIDIA/NemoClaw Length of output: 3596 🏁 Script executed: cat -n .github/workflows/banned-files.yaml | tail -30Repository: NVIDIA/NemoClaw Length of output: 1364 🏁 Script executed: # Check if there are any banned files already tracked in the repository
git ls-files | grep -E '\.(env|pem|key|p12|pfx|jks|keystore)$|credentials\.json|service-account.*\.json|\.npmrc|\.netrc|_rsa|_ed25519|_ecdsa|\.DS_Store|Thumbs\.db|desktop\.ini|\.pyc' | head -20Repository: NVIDIA/NemoClaw Length of output: 41 🏁 Script executed: # Check the git diff behavior difference between HEAD and head.sha in the context
# Let's also verify how the checkout action handles the ref
git rev-parse HEAD
git log --oneline -1Repository: NVIDIA/NemoClaw Length of output: 179 🏁 Script executed: # Let's verify what the diff-filter options mean and check the actual behavior
# in a PR context. Let's also see if there are any comments or docs about this choice
git log -p --follow -S 'diff-filter=ACR' -- .github/workflows/banned-files.yaml | head -100Repository: NVIDIA/NemoClaw Length of output: 41 🏁 Script executed: # Let's check if there are any other workflow files or documentation that explain the design decision
find .github -type f -name "*.yaml" -o -name "*.yml" | xargs grep -l "diff-filter" 2>/dev/nullRepository: NVIDIA/NemoClaw Length of output: 94 Include modified files in the scan set. Line 54 uses Additionally, replace Suggested fix- CHANGED=$(git diff --name-only --diff-filter=ACR "${{ github.event.pull_request.base.sha }}" HEAD)
+ CHANGED=$(git diff --name-only --diff-filter=ACMR "${{ github.event.pull_request.base.sha }}" "${{ github.event.pull_request.head.sha }}")📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||
|
|
||||||
| if [ -z "$CHANGED" ]; then | ||||||
| echo "No changed files to check." | ||||||
| exit 0 | ||||||
| fi | ||||||
|
|
||||||
| FAILED=0 | ||||||
| VIOLATIONS="" | ||||||
|
|
||||||
| i=0 | ||||||
| while [ $i -lt ${#BANNED_PATTERNS[@]} ]; do | ||||||
| pattern="${BANNED_PATTERNS[$i]}" | ||||||
| reason="${BANNED_PATTERNS[$((i+1))]}" | ||||||
| i=$((i+2)) | ||||||
|
|
||||||
| # Convert glob to grep pattern: *.ext -> \.ext$, exact -> /exact$|^exact$ | ||||||
| while IFS= read -r file; do | ||||||
| # Skip exempt paths | ||||||
| if echo "$file" | grep -qE "$EXEMPT_DIRS"; then | ||||||
| continue | ||||||
| fi | ||||||
|
|
||||||
| # Match the basename against the glob pattern | ||||||
| basename=$(basename "$file") | ||||||
| if [[ "$basename" == $pattern ]]; then | ||||||
| VIOLATIONS="${VIOLATIONS}\n ❌ ${file} — ${reason}" | ||||||
| FAILED=1 | ||||||
| fi | ||||||
| done <<< "$CHANGED" | ||||||
| done | ||||||
|
|
||||||
| if [ "$FAILED" -eq 1 ]; then | ||||||
| echo "" | ||||||
| echo "============================================" | ||||||
| echo " Banned files detected in this PR" | ||||||
| echo "============================================" | ||||||
| echo -e "$VIOLATIONS" | ||||||
| echo "" | ||||||
| echo "These files should not be committed to the repository." | ||||||
| echo "If this is a false positive (e.g., a test fixture)," | ||||||
| echo "place the file under test/fixtures/ or testdata/." | ||||||
| echo "" | ||||||
| exit 1 | ||||||
| fi | ||||||
|
|
||||||
| echo "✅ No banned files found." | ||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: NVIDIA/NemoClaw
Length of output: 160
🏁 Script executed:
Repository: NVIDIA/NemoClaw
Length of output: 4429
Directory build artifacts should be explicitly blocked in the deny list with full-path matching.
The BANNED_PATTERNS list is missing
node_modules/,dist/, and__pycache__/entries. Additionally, line 79 only checks the basename, which prevents path-based patterns from working. A directory artifact likenode_modules/package/index.jswould not be caught since the basename check only seesindex.js.Add the missing directory patterns and update the matching logic to check the full file path:
Suggested fix
BANNED_PATTERNS=( '*.pyc' 'Python bytecode' + 'node_modules/*' 'Dependency vendor directory' + 'dist/*' 'Build artifact directory' + '__pycache__/*' 'Python cache directory' )🤖 Prompt for AI Agents