Skip to content

fix(security): SEC010 FP on secrets.choice / os.urandom#9

Merged
wei9072 merged 1 commit into
mainfrom
fix/sec010-secrets-fp
May 6, 2026
Merged

fix(security): SEC010 FP on secrets.choice / os.urandom#9
wei9072 merged 1 commit into
mainfrom
fix/sec010-secrets-fp

Conversation

@wei9072
Copy link
Copy Markdown
Owner

@wei9072 wei9072 commented May 6, 2026

Summary

Round 8 of the codex agent experiment (gpt-5.4-mini × Plan A
ambiguous spec) surfaced a false positive that PR #6's SEC010
expansion introduced (or made far more visible): the SECURE choice,
`secrets.choice`, was being flagged as weak RNG.

The bug:

import secrets, string
ALPHANUMERIC = string.ascii_letters + string.digits
def make_code():
    return "".join(secrets.choice(ALPHANUMERIC) for _ in range(8))

`secrets.choice` shares the function name `choice` with
`random.choice`. The match was on the bare last segment, so both
fired. Combined with PR #6's opaque-id heuristic (which detects
`string.ascii_letters` in the enclosing function), every URL
shortener using `secrets` got flagged as if it were using
`random`. Active danger: the agent might "fix" secure code into
something worse.

Fix: add a known-secure module allowlist at the top of
`check_weak_random_for_token`:

  • Python: `secrets.`, `os.urandom`
  • Node / browser: `crypto.`, `window.crypto.`, `self.crypto.`,
    `subtle.`
  • Go: `crypto/`, `rand.read`, `rand.int` (for crypto/rand)
  • Other: `nacl.`, `tls.`

If the call name contains any of these, return early — even if the
last segment matches a weak-RNG name.

Test plan

  • `cargo test --workspace` — 132 / 132 pass (was 130; +2 new
    regression tests)
  • Live MCP run on the FP shape: previously SEC010 fires,
    now silent
  • Existing positive tests still fire (`random.choice` / Math.random
    for token still flagged)
  • CI green on push

🤖 Generated with Claude Code

Round 8 codex experiment surfaced an FP that PR #6's SEC010
expansion introduced (or made far more visible): `secrets.choice`
on a character pool was getting flagged as weak RNG, even though
`secrets.choice` is the SECURE choice — the same `choice` function
name shared with `random.choice` was matching unconditionally.

The shape that fired falsely:

    import secrets, string
    ALPHANUMERIC = string.ascii_letters + string.digits
    def make_code():
        return "".join(secrets.choice(ALPHANUMERIC) for _ in range(8))

Receiver module is `secrets`, not `random` — should pass.

Fix: at the top of `check_weak_random_for_token`, check the full
call name against an allowlist of known-cryptographic RNG modules
(`secrets.`, `os.urandom`, `crypto.`, `nacl.`, `subtle.`,
`rand.read`/`rand.int` for Go crypto/rand, `window.crypto.`,
`self.crypto.`). Match anywhere in the name to handle dotted /
slashed forms.

Tests: 2 new — `secrets.choice` over ascii pool (was the FP from
the experiment), `os.urandom` for token (different shape, same
class).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@wei9072 wei9072 merged commit 005e170 into main May 6, 2026
1 check passed
@wei9072 wei9072 deleted the fix/sec010-secrets-fp branch May 6, 2026 02:08
wei9072 added a commit that referenced this pull request May 6, 2026
… dispatch (#10)

* feat(aegis-core): ast::imports — per-file Import struct + extractor

Introduces the first per-file Layer 1 fact-extraction module. Every
language adapter declares an `import_query` capturing module names
as `@import`; until now signals/imports_local.rs and workspace.rs
each ran that query independently. This pulls the common pattern
into one place so future per-file fact derivation (security
receiver resolution, public symbols, etc.) has the same home.

`extract_imports(parsed) -> Vec<Import>` runs the adapter query
once and returns normalized module strings with line numbers.
Pure function; caching is the next commit's job (`ParsedFile.imports()`
lazy cache).

5 tests cover Python plain/from-imports, Go quoted module paths,
JS/TS quote stripping, line numbering (1-indexed), and the contract
that callers must hold a ParsedFile (unparseable extensions can't
reach this code path).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(aegis-core): ParsedFile.imports() lazy cache + resolve_receiver()

Adds an `OnceCell<Vec<Import>>` to ParsedFile so the import query
runs at most once per file. Single-threaded by design — ParsedFile
is owned by gather_findings and not shared across threads, so
std::cell::OnceCell suffices.

New API:
- `ParsedFile::imports() -> &[Import]` — lazy cache populated on
  first call.
- `ParsedFile::resolve_receiver(name) -> Option<&Import>` — best-
  effort receiver-to-import lookup. Today: last-segment of module
  path or full module name. Alias-aware resolution (Go
  `import myrand "math/rand"`) is intentionally deferred — needs
  language-specific AST shapes.

5 new tests cover cache population, cache stability across calls
(identical slice pointer), Python module-name resolution, Go
last-segment path resolution (`math/rand` → `rand`), and the
empty-string boundary.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* refactor(aegis-core): migrate consumers to ParsedFile.imports() cache

Both signals/imports_local.rs and workspace.rs::summarize_file used
to run their own copy of:

    let query = Query::new(lang, adapter.import_query()).ok()?;
    let mut qc = QueryCursor::new();
    for m in qc.matches(&query, parsed.root_node(), src) {
        for cap in m.captures { ... adapter.normalize_import(text) ... }
    }

That work now lives in ParsedFile.imports() and runs at most once
per file. Both call sites collapse to a single iterator over
`parsed.imports()`.

Behavioural delta: none. Both consumers were already de-duplicating
on the module string (HashSet in workspace.rs, line-counting in
imports_local.rs); the cache returns a `Vec<Import>` with each
distinct (module, line) pair, so consumer behaviour is preserved.

Drops three unused tree-sitter type imports from workspace.rs
(Parser, Query, QueryCursor) — they were only used by the now-gone
inline import-extraction. The pre-existing unused `ParsedFile`
import in findings.rs is separate scope; left alone.

142 / 142 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(security): SEC010 language-aware dispatch over Layer 1 imports

Replaces the polyglot needle list (`choice` | `randint` | ... |
`Math.random`) with per-language matchers. Each language's matcher
hard-codes the receiver path, so safe APIs that share function
names (`secrets.choice` shares `choice`, `SecureRandom.nextInt`
shares `nextInt`, `crypto/rand.Read` shares `Read`) never match.

Per-language coverage:

- **Python**: `random.X` only (`X` ∈ choice/choices/randint/sample/
  uniform/randrange/shuffle/random). Excludes `secrets.X`,
  `os.urandom`, `np.random.X`, `random.SystemRandom().X`.
- **JS / TS**: `Math.random` only.
- **Go**: `math/rand` method set, with import-resolution via
  `parsed.resolve_receiver()` to disambiguate from `crypto/rand`.
  `import "math/rand"` → `rand.Intn` flags; `import "crypto/rand"` →
  `rand.Read` doesn't (and `Intn`/`Int31`/etc. don't even exist
  there). Falls back to method-name conservative default when no
  matching import is in the file.
- **Java / Kotlin**: `nextInt`/`nextLong`/etc. on `Random` (or
  unqualified). `SecureRandom`-prefixed receivers excluded by
  receiver-path filter.
- **C#**: `Next`/`NextDouble`/`NextBytes`. Excludes
  `RandomNumberGenerator` and any `Cryptography`-namespace receiver.
- **PHP**: global `rand`/`mt_rand`/`array_rand`. `random_int` and
  `random_bytes` are CSPRNG and intentionally absent.
- **Rust**: `gen`/`gen_range`/`gen_bool` from the rand crate.
  `OsRng` excluded by receiver-path filter.

Drops PR #9's `safe_module_prefixes` whitelist hack — now
redundant. Each language's matcher hard-codes the receiver path,
so `secrets.choice` never reaches the SEC010 path in the first
place because `is_python_weak_rng` requires `first == "random"`.

Threads `&ParsedFile` down through `walk()` so language identity
and import facts are available at the SEC010 call site. Other SEC
checks unchanged.

Existing 142 tests pass without modification — the dispatch
preserves semantics for everything that was already flagging.
Per-language regression tests added in the next commit.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* test(security): SEC010 per-language regression coverage

10 new tests covering the dispatch matrix from PR #10 step 4:

Python:
  - random.SystemRandom() CSPRNG-backed → does NOT fire
  - numpy np.random.choice (statistical sampling) → does NOT fire

JS / TS:
  - Math.random for token (chained .toString[2] form) → fires
  - window.crypto.getRandomValues → does NOT fire

Go (with Layer 1 import resolution):
  - import "math/rand" + rand.Intn → fires
  - import "crypto/rand" + rand.Read → does NOT fire (same literal
    `rand.Read` text, different module per import resolution)

Java:
  - new Random().nextInt(...) literal → fires
  - new SecureRandom().nextInt(...) literal → does NOT fire
  Note: variable-stored (`Random r = new Random(); r.nextInt`) is a
  known FN — without dataflow we can't recover the class identity
  from the receiver `r`. Worth less than the FP risk on
  `SecureRandom r; r.nextInt`.

C# / PHP:
  - new Random().Next(...) → fires
  - PHP rand($min, $max) → fires
  - PHP random_int($min, $max) (CSPRNG) → does NOT fire

Also fixes call-name extraction for Java's `method_invocation`,
which lacks a `function` field — composes `object.name` from the
two child fields so the receiver path is visible to the matcher.
And extends `enclosing_token_context` to recognize Go (var_spec /
short_var_declaration), Java (local_variable_declaration), and JS
augmented assignment kinds, so the token-context detection works
across the languages now in scope.

153 / 153 tests pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
wei9072 added a commit that referenced this pull request May 7, 2026
Adds the empirical validation work that drove PRs #6#12 into
the repo as a reproducibility archive under `experiments/`.

Contents:

- `comparison-report.md` (1199 lines, rolling Round 1 → Round 9
  analysis). Documents the three Aegis ROI mechanisms surfaced
  from data:
    1. Rule-hit → fix (brownfield Plan B: 0/3 → 3/3 across 3 models)
    2. Structural guardrail (cycle / public_symbol_removed —
       0/14 hits, dead weight on clean architectures)
    3. Anti-paralysis ritual (weak models complete tasks they
       would otherwise abandon)
- 4 starting-code fixtures (Python brownfield, Go brownfield,
  Java brownfield, Python multi-module).
- 11 prompt files. Each task has paired `-a.txt` (no Aegis) and
  `-b.txt` (with Aegis MCP + REQUIRED-workflow ritual instruction).
- 52 round directories: per-model deliverables + `run.log` for
  codex-driven rounds. Naming: <model>-<task>-<a|b>.
- `aegis_validate.py` — Python wrapper around aegis-mcp stdio
  JSON-RPC, used by agents to run validation after each write.
- 3 eval scripts that diff each round's deliverables against the
  planted SEC bugs.

Excluded via `.gitignore` and rsync filter (would have been ~970MB
of bloat): venv / .venv / __pycache__ / .pytest_cache / .toolchain
(Go toolchain copies codex downloads) / compiled binaries / git
metadata of nested repos. Final archive: 17MB.

Direct lineage from this archive into Aegis code:

| Round | Surfaced | Fixed in |
|---|---|---|
| Round 8 codex | SEC010 FP on `secrets.choice` | PR #9 |
| Round 9 Go / Java | SEC009 multi-language coverage = 0 | PR #12 |
| Round 9 Java | SEC010 inner-block `break` hid production case | PR #11 |

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
wei9072 added a commit that referenced this pull request May 7, 2026
#14)

- Main README.md and README.zh-TW.md: new "Experiments archive" /
  「實驗資料」 section linking to experiments/README.md
- experiments/README.md: replaces the prose-only intro with four
  analysis charts:
    1. Round structure overview (52 dirs, 26 paired, 11 models)
    2. Plan B brownfield 0/3 → 3/3 fix-rate matrix with per-model
       remaining-bug counts (real numbers from re-validating the
       archive against current rule library)
    3. Plan C multi-module task-completion matrix surfacing the
       anti-paralysis ROI mechanism (g54mini-mc-a abandoned vs
       g54mini-mc-b completed same task)
    4. Mermaid flowchart of the three Aegis ROI mechanisms
       (rule-hit → fix; structural guardrail; anti-paralysis ritual)
- Plus a "Direct lineage" chart connecting each experiment finding
  to the PR that fixed it (Round 8 → PR #9; Round 9 → PR #11/#12).

Also re-imports `experiments/31flashlite-amb-b/` as plain files —
the previous commit accidentally captured it as a gitlink because
codex had created a nested `.git/` inside the round dir during the
agent run. Removed the nested `.git/` and re-added the 6 actual
deliverable files.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant