You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# .github/codeql/codeql-config.yml — a CodeQL run that REPORTS zero alerts# while actually analyzing almost nothing, because autobuild silently# half-compiled a Java module. The skill's CI review would mark this "passing".queries:
- uses: security-extended# (no failure surfaced; database built from 12% of sources)
And a CWE-coverage example:
CWE-352 (CSRF): 0 active SAST rules -> skill flags this as a High/Medium GAP
Why this is a false positive:
The CWE Top 25 coverage matrix produces false "gap" findings for weakly-SAST-detectable CWEs. The matrix lists CSRF (CWE-352) and Unrestricted Upload (CWE-434) as detectable and ties "CWE Top 10 with zero coverage = High." But CSRF is fundamentally a framework/runtime property that SAST detects poorly and noisily; out-of-bounds write/read and use-after-free (787/125/416) are only meaningfully reachable in C/C++ with heavy analysis. Penalizing a Python/JS team for "0 rules on CWE-352/787" reports gaps for weaknesses SAST cannot reliably cover — a false finding driven by the matrix, not the codebase.
Semgrep ERROR → Critical/High → block merge is a uniform mapping that over-blocks. Severity in Semgrep rules is author-assigned; community/registry rules routinely ship ERROR for medium-real-risk patterns. Section 5.1 and Pitfall intel: ai-security social updates 2026-03-18 #3 hardwire ERROR to "block CI," which mass-blocks on noisy rulesets — directly contradicting the skill's own false-positive-reduction goal.
Blanket test-path exclusion treated as always safe. 3.1 says excluding test/, *.test.js, *.spec.py is "acceptable." But security-relevant logic and vendored third-party code often live under excluded paths, and vendor//node_modules exclusion (correct for first-party SAST) silently drops copied third-party source that is a real attack surface. Presenting these exclusions as unconditionally safe under-reports.
Coverage Gaps
Missed variant 1:
# CodeQL matrix for a compiled language with no build-success verificationstrategy:
matrix:
language: [java, javascript]steps:
- uses: github/codeql-action/autobuild@v3 # may silently fail for java
- uses: github/codeql-action/analyze@v3 # reports "0 alerts" on a near-empty DB
Why it should be caught:
For compiled languages (Java, C#, C/C++, Kotlin), CodeQL only analyzes code it successfully builds. If autobuild partially fails, the database covers a fraction of the code and analyze cheerfully reports few/zero alerts — a high-severity false negative that looks like success. The Step 6 CI review checks that scans run and are required, but never verifies build success, database completeness, or that every matrix language actually produced a populated database. This is the single most common way CodeQL "passes" while covering nothing.
Missed variant 2:
# semgrep ci with diff-aware/baseline suppression hides pre-existing, unfixed findings forever
semgrep ci --baseline-commit "$(git merge-base origin/main HEAD)"# A SQLi present before the baseline is never surfaced in any PR.
Why it should be caught:
The skill recommends semgrep ci and PR-scoped scanning but never addresses baseline/diff-aware suppression of known-but-unfixed findings. Pitfall #1 notes incremental scans miss cross-file flows, but the larger issue is that baseline mode permanently hides pre-existing debt: a vulnerability older than the baseline never appears in PR checks and, without a tracked full-scan backlog, is invisible. The review needs a check that scheduled full scans exist and that their findings are tracked to closure, not just that a weekly cron is present.
Edge Cases
Unscoped # nosemgrep is a silent global suppression. A bare # nosemgrep (no rule id) disables all rules on that line, not just the intended one. The skill's example correctly scopes the suppression, but it never warns that unscoped suppressions are a coverage hole — exactly what a SAST-config review should flag. Same applies to overly broad query-filtersexclude by tag.
FP-rate target (<20%) is unmeasurable and perverse. Without ground-truth labeling there is no true denominator; setting a numeric FP target incentivizes disabling noisy-but-valid rules to "hit the number," undermining coverage. Should be framed as a triaged-trend, not a hard threshold.
security-and-quality vs the <10-min PR budget. 4.1 recommends security-and-quality for "maximum coverage," but that suite adds maintainability/quality queries that inflate noise and runtime — colliding with the skill's own "under 10 minutes for PR checks" guidance. The trade-off (full suite on schedule, leaner suite on PR) is unstated.
custom.auth.jwt-none-algorithm matches algorithm="none" literally — it misses None/NONE case variants and the algorithm arriving via a variable, so it under-detects the exact bug it targets.
custom.auth.hardcoded-admin-bypass matches only if $USER == "admin": return True — it won't catch role-field comparisons, in checks, or != inversions, making it a narrow rule presented as a model.
Recommended changes: (1) mark weakly-SAST-detectable CWEs (352/434/787/416/125) as "low SAST confidence" in the matrix and don't drive High/Medium gap severity from them; (2) add a CodeQL build-success / database-completeness verification to Step 6; (3) add baseline-suppression backlog tracking and an unscoped-nosemgrep check; (4) condition severity on rule confidence rather than mapping all ERROR → block.
Comparison to Other Tools
Tool
Catches this?
Notes
Semgrep (managed/Pro)
Partial
Pro adds cross-file taint and confidence metadata that could drive the severity-by-confidence fix this review recommends.
CodeQL
Partial
Powerful taint engine, but the build-completeness false-negative is a CodeQL-specific failure mode the skill omits.
SonarQube
Partial
Good FP-management UX and quality gates; mixes quality+security, echoing the noise trade-off noted here.
Snyk Code (DeepCode)
Yes
Interfile analysis without a build step — avoids the CodeQL autobuild false-negative for compiled languages.
OpenSSF Scorecard
Partial
Checks CI hardening (pinned actions, required checks) complementary to this skill's Step 6.
Overall Assessment
Strengths:
Genuinely useful CWE Top 25 ↔ rule mapping concept, concrete Semgrep/CodeQL authoring checklists, and solid CI integration patterns with branch-protection and scheduling guidance.
Pitfalls are on-point (disable-vs-fix, severity flattening, test custom rules against safe+vulnerable corpora).
Clear ASVS mapping and a clean assessment output template.
No CodeQL build-success/database-completeness check → silent false negatives that look like passing scans.
No handling of baseline/diff-aware suppression hiding pre-existing findings, or of unscoped nosemgrep.
Uniform ERROR → block and a hard <20% FP target work against the skill's own FP-reduction aim; two example custom rules are themselves low-coverage.
Priority recommendations:
Add CodeQL build-success/database-completeness verification to the CI review (highest-impact false-negative).
Re-tier the CWE matrix by SAST confidence so weakly-detectable CWEs don't generate gap findings, and map severity by rule confidence rather than raw ERROR.
Add baseline-suppression backlog tracking and an unscoped-nosemgrep coverage-hole check.
Skill Being Reviewed
Skill name:
sast-configSkill path:
skills/devsecops/sast-config/False Positive Analysis
Benign code that triggers a false positive:
And a CWE-coverage example:
Why this is a false positive:
Semgrep ERROR → Critical/High → block mergeis a uniform mapping that over-blocks. Severity in Semgrep rules is author-assigned; community/registry rules routinely shipERRORfor medium-real-risk patterns. Section 5.1 and Pitfall intel: ai-security social updates 2026-03-18 #3 hardwire ERROR to "block CI," which mass-blocks on noisy rulesets — directly contradicting the skill's own false-positive-reduction goal.test/,*.test.js,*.spec.pyis "acceptable." But security-relevant logic and vendored third-party code often live under excluded paths, andvendor//node_modulesexclusion (correct for first-party SAST) silently drops copied third-party source that is a real attack surface. Presenting these exclusions as unconditionally safe under-reports.Coverage Gaps
Missed variant 1:
Why it should be caught:
For compiled languages (Java, C#, C/C++, Kotlin), CodeQL only analyzes code it successfully builds. If
autobuildpartially fails, the database covers a fraction of the code andanalyzecheerfully reports few/zero alerts — a high-severity false negative that looks like success. The Step 6 CI review checks that scans run and are required, but never verifies build success, database completeness, or that every matrix language actually produced a populated database. This is the single most common way CodeQL "passes" while covering nothing.Missed variant 2:
Why it should be caught:
The skill recommends
semgrep ciand PR-scoped scanning but never addresses baseline/diff-aware suppression of known-but-unfixed findings. Pitfall #1 notes incremental scans miss cross-file flows, but the larger issue is that baseline mode permanently hides pre-existing debt: a vulnerability older than the baseline never appears in PR checks and, without a tracked full-scan backlog, is invisible. The review needs a check that scheduled full scans exist and that their findings are tracked to closure, not just that a weekly cron is present.Edge Cases
# nosemgrepis a silent global suppression. A bare# nosemgrep(no rule id) disables all rules on that line, not just the intended one. The skill's example correctly scopes the suppression, but it never warns that unscoped suppressions are a coverage hole — exactly what a SAST-config review should flag. Same applies to overly broadquery-filtersexcludeby tag.security-and-qualityvs the <10-min PR budget. 4.1 recommendssecurity-and-qualityfor "maximum coverage," but that suite adds maintainability/quality queries that inflate noise and runtime — colliding with the skill's own "under 10 minutes for PR checks" guidance. The trade-off (full suite on schedule, leaner suite on PR) is unstated.Remediation Quality
custom.auth.jwt-none-algorithmmatchesalgorithm="none"literally — it missesNone/NONEcase variants and the algorithm arriving via a variable, so it under-detects the exact bug it targets.custom.auth.hardcoded-admin-bypassmatches onlyif $USER == "admin": return True— it won't catch role-field comparisons,inchecks, or!=inversions, making it a narrow rule presented as a model.Recommended changes: (1) mark weakly-SAST-detectable CWEs (352/434/787/416/125) as "low SAST confidence" in the matrix and don't drive High/Medium gap severity from them; (2) add a CodeQL build-success / database-completeness verification to Step 6; (3) add baseline-suppression backlog tracking and an unscoped-
nosemgrepcheck; (4) condition severity on rule confidence rather than mapping all ERROR → block.Comparison to Other Tools
Overall Assessment
Strengths:
Needs improvement:
nosemgrep.ERROR → blockand a hard <20% FP target work against the skill's own FP-reduction aim; two example custom rules are themselves low-coverage.Priority recommendations:
nosemgrepcoverage-hole check.Bounty Info
0x1b58008Fde85832845817F9B24E64d139E418EeF