Skip to content

[REVIEW] sast-config: CWE-matrix false gaps, no CodeQL build-completeness check, baseline-suppression blind spot #1145

@Steven13799

Description

@Steven13799

Skill Being Reviewed

Skill name: sast-config
Skill path: skills/devsecops/sast-config/

False Positive Analysis

Benign code that triggers a false positive:

# .github/codeql/codeql-config.yml — a CodeQL run that REPORTS zero alerts
# while actually analyzing almost nothing, because autobuild silently
# half-compiled a Java module. The skill's CI review would mark this "passing".
queries:
  - uses: security-extended
# (no failure surfaced; database built from 12% of sources)

And a CWE-coverage example:

CWE-352 (CSRF): 0 active SAST rules  -> skill flags this as a High/Medium GAP

Why this is a false positive:

  1. The CWE Top 25 coverage matrix produces false "gap" findings for weakly-SAST-detectable CWEs. The matrix lists CSRF (CWE-352) and Unrestricted Upload (CWE-434) as detectable and ties "CWE Top 10 with zero coverage = High." But CSRF is fundamentally a framework/runtime property that SAST detects poorly and noisily; out-of-bounds write/read and use-after-free (787/125/416) are only meaningfully reachable in C/C++ with heavy analysis. Penalizing a Python/JS team for "0 rules on CWE-352/787" reports gaps for weaknesses SAST cannot reliably cover — a false finding driven by the matrix, not the codebase.
  2. Semgrep ERROR → Critical/High → block merge is a uniform mapping that over-blocks. Severity in Semgrep rules is author-assigned; community/registry rules routinely ship ERROR for medium-real-risk patterns. Section 5.1 and Pitfall intel: ai-security social updates 2026-03-18 #3 hardwire ERROR to "block CI," which mass-blocks on noisy rulesets — directly contradicting the skill's own false-positive-reduction goal.
  3. Blanket test-path exclusion treated as always safe. 3.1 says excluding test/, *.test.js, *.spec.py is "acceptable." But security-relevant logic and vendored third-party code often live under excluded paths, and vendor//node_modules exclusion (correct for first-party SAST) silently drops copied third-party source that is a real attack surface. Presenting these exclusions as unconditionally safe under-reports.

Coverage Gaps

Missed variant 1:

# CodeQL matrix for a compiled language with no build-success verification
strategy:
  matrix:
    language: [java, javascript]
steps:
  - uses: github/codeql-action/autobuild@v3   # may silently fail for java
  - uses: github/codeql-action/analyze@v3     # reports "0 alerts" on a near-empty DB

Why it should be caught:
For compiled languages (Java, C#, C/C++, Kotlin), CodeQL only analyzes code it successfully builds. If autobuild partially fails, the database covers a fraction of the code and analyze cheerfully reports few/zero alerts — a high-severity false negative that looks like success. The Step 6 CI review checks that scans run and are required, but never verifies build success, database completeness, or that every matrix language actually produced a populated database. This is the single most common way CodeQL "passes" while covering nothing.

Missed variant 2:

# semgrep ci with diff-aware/baseline suppression hides pre-existing, unfixed findings forever
semgrep ci --baseline-commit "$(git merge-base origin/main HEAD)"
# A SQLi present before the baseline is never surfaced in any PR.

Why it should be caught:
The skill recommends semgrep ci and PR-scoped scanning but never addresses baseline/diff-aware suppression of known-but-unfixed findings. Pitfall #1 notes incremental scans miss cross-file flows, but the larger issue is that baseline mode permanently hides pre-existing debt: a vulnerability older than the baseline never appears in PR checks and, without a tracked full-scan backlog, is invisible. The review needs a check that scheduled full scans exist and that their findings are tracked to closure, not just that a weekly cron is present.

Edge Cases

  • Unscoped # nosemgrep is a silent global suppression. A bare # nosemgrep (no rule id) disables all rules on that line, not just the intended one. The skill's example correctly scopes the suppression, but it never warns that unscoped suppressions are a coverage hole — exactly what a SAST-config review should flag. Same applies to overly broad query-filters exclude by tag.
  • FP-rate target (<20%) is unmeasurable and perverse. Without ground-truth labeling there is no true denominator; setting a numeric FP target incentivizes disabling noisy-but-valid rules to "hit the number," undermining coverage. Should be framed as a triaged-trend, not a hard threshold.
  • security-and-quality vs the <10-min PR budget. 4.1 recommends security-and-quality for "maximum coverage," but that suite adds maintainability/quality queries that inflate noise and runtime — colliding with the skill's own "under 10 minutes for PR checks" guidance. The trade-off (full suite on schedule, leaner suite on PR) is unstated.

Remediation Quality

  • Fix resolves the vulnerability
  • Fix doesn't introduce new security issues
  • Fix doesn't break functionality
  • Issues found: The CI patterns and custom-rule checklists are strong, but two of the example custom rules themselves illustrate Pitfall Enhance all 45 skills per 7-rule audit evaluation #4 (low coverage):
    • custom.auth.jwt-none-algorithm matches algorithm="none" literally — it misses None/NONE case variants and the algorithm arriving via a variable, so it under-detects the exact bug it targets.
    • custom.auth.hardcoded-admin-bypass matches only if $USER == "admin": return True — it won't catch role-field comparisons, in checks, or != inversions, making it a narrow rule presented as a model.
      Recommended changes: (1) mark weakly-SAST-detectable CWEs (352/434/787/416/125) as "low SAST confidence" in the matrix and don't drive High/Medium gap severity from them; (2) add a CodeQL build-success / database-completeness verification to Step 6; (3) add baseline-suppression backlog tracking and an unscoped-nosemgrep check; (4) condition severity on rule confidence rather than mapping all ERROR → block.

Comparison to Other Tools

Tool Catches this? Notes
Semgrep (managed/Pro) Partial Pro adds cross-file taint and confidence metadata that could drive the severity-by-confidence fix this review recommends.
CodeQL Partial Powerful taint engine, but the build-completeness false-negative is a CodeQL-specific failure mode the skill omits.
SonarQube Partial Good FP-management UX and quality gates; mixes quality+security, echoing the noise trade-off noted here.
Snyk Code (DeepCode) Yes Interfile analysis without a build step — avoids the CodeQL autobuild false-negative for compiled languages.
OpenSSF Scorecard Partial Checks CI hardening (pinned actions, required checks) complementary to this skill's Step 6.

Overall Assessment

Strengths:

  • Genuinely useful CWE Top 25 ↔ rule mapping concept, concrete Semgrep/CodeQL authoring checklists, and solid CI integration patterns with branch-protection and scheduling guidance.
  • Pitfalls are on-point (disable-vs-fix, severity flattening, test custom rules against safe+vulnerable corpora).
  • Clear ASVS mapping and a clean assessment output template.

Needs improvement:

  • Coverage matrix drives false "gap" findings for CWEs SAST detects poorly (CSRF/OOB/UAF).
  • No CodeQL build-success/database-completeness check → silent false negatives that look like passing scans.
  • No handling of baseline/diff-aware suppression hiding pre-existing findings, or of unscoped nosemgrep.
  • Uniform ERROR → block and a hard <20% FP target work against the skill's own FP-reduction aim; two example custom rules are themselves low-coverage.

Priority recommendations:

  1. Add CodeQL build-success/database-completeness verification to the CI review (highest-impact false-negative).
  2. Re-tier the CWE matrix by SAST confidence so weakly-detectable CWEs don't generate gap findings, and map severity by rule confidence rather than raw ERROR.
  3. Add baseline-suppression backlog tracking and an unscoped-nosemgrep coverage-hole check.

Bounty Info

  • I have read and agree to the CONTRIBUTING.md bounty terms
  • Preferred payment method: Crypto, preferably Base USDC to 0x1b58008Fde85832845817F9B24E64d139E418EeF

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions