Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions .github/workflows/guard-couchdb-data.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Blocks NEW data files (csv/json/etc.) added under src/couchdb/ or any
# subfolder, unless their path is listed in src/couchdb/.allowed_datafiles.
# To permit a new data file, add its path to that allowlist in the same PR.
# Uses only actions/checkout (GitHub-created) so it runs under the IBM org
# action policy. Place at: .github/workflows/guard-couchdb-data.yml
name: Guard couchdb data files

on:
pull_request:
paths:
- 'src/couchdb/**'

permissions:
contents: read

jobs:
guard:
name: No unapproved data files
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Check for disallowed data files
env:
BASE_SHA: ${{ github.event.pull_request.base.sha }}
HEAD_SHA: ${{ github.event.pull_request.head.sha }}
run: |
set -euo pipefail
# Data-file extensions we want to gate (edit to taste).
EXT='\.(csv|tsv|json|jsonl|ndjson|parquet|xls|xlsx|feather|h5|hdf5|pkl|pickle|npy|npz|db|sqlite|sqlite3|avro|orc)$'
ALLOW="src/couchdb/.allowed_datafiles"

# Files newly ADDED in this PR under src/couchdb that look like data.
added=$(git diff --name-only --diff-filter=A "$BASE_SHA" "$HEAD_SHA" -- src/couchdb \
| grep -iE "$EXT" || true)

violations=""
while IFS= read -r f; do
[ -z "$f" ] && continue
if ! grep -qxF "$f" "$ALLOW" 2>/dev/null; then
violations="${violations}${f}"$'\n'
fi
done <<< "$added"

if [ -n "$violations" ]; then
echo "::error::New data files are not allowed under src/couchdb/ unless allowlisted."
echo "Disallowed additions:"
printf ' - %s\n' $violations
echo ""
echo "If a file is intentional, add its exact path to ${ALLOW} in this PR"
echo "(a maintainer must review that change)."
exit 1
fi
echo "OK: no unapproved data files added under src/couchdb/"
56 changes: 56 additions & 0 deletions .github/workflows/secret-scan.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Server-side secret scanning that uses NO third-party Actions, so it runs
# under the IBM org policy "Allow IBM, and select non-IBM, actions".
# Only actions/checkout (GitHub-created, already allowed) is used; the scanners
# are installed via plain shell steps, which the org action policy does not gate.
# Place this file at: .github/workflows/secret-scan.yml
name: Secret Scan

on:
push:
branches: ['**']
pull_request:
branches: ['**']
schedule:
- cron: '0 6 * * 1' # weekly full-history sweep, Mondays 06:00 UTC

permissions:
contents: read

jobs:
gitleaks:
name: Gitleaks
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # full history so the scan covers all commits

- name: Install gitleaks
run: |
VERSION=8.21.2
curl -sSL "https://github.com/gitleaks/gitleaks/releases/download/v${VERSION}/gitleaks_${VERSION}_linux_x64.tar.gz" \
| tar -xz -C /usr/local/bin gitleaks
gitleaks version

- name: Run gitleaks
run: gitleaks detect --source . --redact --verbose
# gitleaks auto-loads .gitleaks.toml from the repo root and
# exits non-zero if any secret is found, failing the check.

trufflehog:
name: TruffleHog
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Install trufflehog
run: |
curl -sSfL https://raw.githubusercontent.com/trufflesecurity/trufflehog/main/scripts/install.sh \
| sh -s -- -b /usr/local/bin v3.82.6
trufflehog --version

- name: Run trufflehog
run: trufflehog git "file://." --results=verified --fail
# --fail makes the job exit non-zero when verified/unknown secrets are found.
24 changes: 24 additions & 0 deletions .gitleaks.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# Gitleaks configuration. Extends the built-in default rule set and adds
# project-specific allowlisting so test fixtures / sample values don't cause
# false positives. Place at repo root as .gitleaks.toml
[extend]
useDefault = true

[allowlist]
description = "Global allowlist for AssetOpsBench"
paths = [
'''\.secrets\.baseline$''',
'''(.*?)(test|tests|fixtures|examples|docs)(/|\\).*''',
'''.*\.md$''',
]
# Known-safe placeholder values (regex). Add real false positives here.
regexes = [
'''(?i)(your[_-]?api[_-]?key|example|dummy|placeholder|changeme|xxxx+)''',
]

# Example of an extra custom rule (uncomment / adapt as needed):
# [[rules]]
# id = "ibm-cloud-api-key"
# description = "IBM Cloud IAM API key"
# regex = '''(?i)(ibm)?[_-]?api[_-]?key['"\s:=]+[A-Za-z0-9_\-]{44}'''
# keywords = ["api_key", "apikey"]
10 changes: 10 additions & 0 deletions .gitleaksignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Known, triaged historical findings. One gitleaks fingerprint per line.
# A fingerprint here only silences the scanner — any real secret listed
# must be ROTATED/REVOKED at its source first.

# .env.public — intentional public/example value.
f4443296d4565ba82ca3ec19303bc929362185eb:.env.public:generic-api-key:9

# benchmark/docker-compose.yml — leaked GITHUB_TOKEN committed 2025-07-16.
# ACTION REQUIRED: token must be revoked (history cannot be un-leaked).
3ee88a4ef923d3f4729c25eccb0096bc7c805cf2:benchmark/docker-compose.yml:github-pat:9
27 changes: 27 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Pre-commit hooks. Install once per clone: pre-commit install
# Run against all files: pre-commit run --all-files
#
# - gitleaks : secret scanning (regex + entropy)
# - detect-secrets : IBM-maintained, baseline-driven secret scanning
# - block-couchdb-data : blocks new data files under src/couchdb/ (local hook)

repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.21.2
hooks:
- id: gitleaks

- repo: https://github.com/Yelp/detect-secrets
rev: v1.5.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']

- repo: local
hooks:
- id: block-couchdb-data
name: Block unapproved data files in src/couchdb
entry: scripts/check_couchdb_data.sh
language: script
pass_filenames: false
always_run: true
173 changes: 173 additions & 0 deletions .secrets.baseline
Original file line number Diff line number Diff line change
@@ -0,0 +1,173 @@
{
"version": "1.5.0",
"plugins_used": [
{
"name": "ArtifactoryDetector"
},
{
"name": "AWSKeyDetector"
},
{
"name": "AzureStorageKeyDetector"
},
{
"name": "Base64HighEntropyString",
"limit": 4.5
},
{
"name": "BasicAuthDetector"
},
{
"name": "CloudantDetector"
},
{
"name": "DiscordBotTokenDetector"
},
{
"name": "GitHubTokenDetector"
},
{
"name": "GitLabTokenDetector"
},
{
"name": "HexHighEntropyString",
"limit": 3.0
},
{
"name": "IbmCloudIamDetector"
},
{
"name": "IbmCosHmacDetector"
},
{
"name": "IPPublicDetector"
},
{
"name": "JwtTokenDetector"
},
{
"name": "KeywordDetector",
"keyword_exclude": ""
},
{
"name": "MailchimpDetector"
},
{
"name": "NpmDetector"
},
{
"name": "OpenAIDetector"
},
{
"name": "PrivateKeyDetector"
},
{
"name": "PypiTokenDetector"
},
{
"name": "SendGridDetector"
},
{
"name": "SlackDetector"
},
{
"name": "SoftlayerDetector"
},
{
"name": "SquareOAuthDetector"
},
{
"name": "StripeDetector"
},
{
"name": "TelegramBotTokenDetector"
},
{
"name": "TwilioKeyDetector"
}
],
"filters_used": [
{
"path": "detect_secrets.filters.allowlist.is_line_allowlisted"
},
{
"path": "detect_secrets.filters.common.is_ignored_due_to_verification_policies",
"min_level": 2
},
{
"path": "detect_secrets.filters.heuristic.is_indirect_reference"
},
{
"path": "detect_secrets.filters.heuristic.is_likely_id_string"
},
{
"path": "detect_secrets.filters.heuristic.is_lock_file"
},
{
"path": "detect_secrets.filters.heuristic.is_not_alphanumeric_string"
},
{
"path": "detect_secrets.filters.heuristic.is_potential_uuid"
},
{
"path": "detect_secrets.filters.heuristic.is_prefixed_with_dollar_sign"
},
{
"path": "detect_secrets.filters.heuristic.is_sequential_string"
},
{
"path": "detect_secrets.filters.heuristic.is_swagger_file"
},
{
"path": "detect_secrets.filters.heuristic.is_templated_secret"
}
],
"results": {
".env.public": [
{
"type": "Secret Keyword",
"filename": ".env.public",
"hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
"is_verified": false,
"line_number": 4
}
],
"src/agent/claude_agent/tests/test_runner.py": [
{
"type": "Secret Keyword",
"filename": "src/agent/claude_agent/tests/test_runner.py",
"hashed_secret": "18176482d2532398c7b84c22c6f8d2e59e55505c",
"is_verified": false,
"line_number": 32
}
],
"src/couchdb/docker-compose.yaml": [
{
"type": "Secret Keyword",
"filename": "src/couchdb/docker-compose.yaml",
"hashed_secret": "5baa61e4c9b93f3f0682250b6cf8331b7ee68fd8",
"is_verified": false,
"line_number": 6
}
],
"src/llm/tests/test_backends.py": [
{
"type": "Secret Keyword",
"filename": "src/llm/tests/test_backends.py",
"hashed_secret": "ef219439b755958216dbdf4b1e3b645b1f54565e",
"is_verified": false,
"line_number": 67
}
],
"src/llm/tests/test_routers.py": [
{
"type": "Secret Keyword",
"filename": "src/llm/tests/test_routers.py",
"hashed_secret": "ef219439b755958216dbdf4b1e3b645b1f54565e",
"is_verified": false,
"line_number": 60
}
]
},
"generated_at": "2026-06-16T23:03:00Z"
}
31 changes: 27 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,15 +80,38 @@ uv run ruff check --fix .

### 3. Security Scanning

To protect industrial metadata and API keys, run the IBM `detect-secrets` scan:
This repo blocks secrets (API keys, tokens, credentials) at three layers:
a local pre-commit hook, a CI workflow, and GitHub push protection. As a
contributor you only need to set up the local hook once per clone:

```bash
uv pip install --upgrade "git+[https://github.com/ibm/detect-secrets.git@master#egg=detect-secrets](https://github.com/ibm/detect-secrets.git@master#egg=detect-secrets)"
detect-secrets scan --update .secrets.baseline
detect-secrets audit .secrets.baseline
uv pip install pre-commit detect-secrets
pre-commit install
```

After this, **every `git commit` automatically runs gitleaks and
detect-secrets** on your staged changes and aborts the commit if a secret is
found. To scan the whole repo on demand:

```bash
pre-commit run --all-files
```

If you add a new file that legitimately contains an example/placeholder value
flagged as a secret, update the detect-secrets baseline and audit it:

```bash
detect-secrets scan --baseline .secrets.baseline
detect-secrets audit .secrets.baseline
```

Known, already-triaged historical findings are listed in `.gitleaksignore`
(by fingerprint). Never add a *real* secret there — if a live credential is
detected, rotate/revoke it at its source first, then remove it from the code.

> The same scans run in CI on every pull request, so a commit that slips past
> the local hook will still be caught before merge.

---

## Running Tests & Validation
Expand Down
Loading
Loading