Skip to content

r2c-CSE/scan-results

Repository files navigation

Semgrep MR → HTML findings report

Note: This is an unofficial tool, provided as-is. If you have improvements to share, feel free to collaborate!

Command-line tool: given a merge/pull request or branch in a Semgrep-backed repo, it calls the Semgrep Findings and Secrets APIs, aligns rows with Git ref patterns your SCM uses for MRs, and writes an HTML report (and optional JSON). When an MR number is present, it can also match a recent CI scan (~30 days) from Semgrep’s scan history.

Requires a Semgrep Team or Enterprise API token.


Requirements

  • Python 3.10+ (recommended; the code uses modern typing syntax).
  • Dependencies: httpx (see requirements.txt).

Setup

python3 -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

Authentication and environment

Name Required Description
SEMGREP_APP_TOKEN or SEMGREP_API_TOKEN Yes Bearer token for the Semgrep API. Either variable is accepted.
SEMGREP_DEPLOYMENT_SLUG No Default deployment slug if you omit --deployment-slug.
export SEMGREP_APP_TOKEN='...'
# optional:
export SEMGREP_DEPLOYMENT_SLUG='your-deployment'

What you can pass as input

Use one of these patterns (do not mix a URL with --project / --mr / --branch).

Pattern Example
MR/PR URL python cli.py 'https://gitlab.com/org/repo/-/merge_requests/42'
Project + MR number python cli.py --project org/repo --mr 42 --scm gitlab
Project + branch python cli.py --project org/repo --branch my-feature
  • With --project + --mr only, --scm auto defaults to GitHub ref patterns. For GitLab MRs without a URL, set --scm gitlab.
  • --source-branch NAME (MR mode only): Semgrep often stores findings on refs/heads/<branch> instead of refs/merge-requests/<iid>/head. Use when results are empty or wrong.
  • --branch accepts a short branch name or a full ref such as refs/heads/main.

Outputs

Artifact Flag / default
HTML report --output-html (default: semgrep-report.html). Skip with --no-html.
Full run JSON --output-json PATH — same content model the HTML is built from.
Aggregate-only JSON --aggregate-json-out PATH — summary/counts payload from aggregation.

The HTML report includes results summary (by product, severity, ref, top rules), a findings table (truncated; see below), run context, reference links (sample API URLs), and optional CI scan metadata.


Defaults (quick reference)

Behavior Default
HTML Written to semgrep-report.html (--no-html to skip).
Products all — SAST + SCA + Secrets.
Aggregate Findings/Secrets On (--no-aggregate-findings to disable).
Repo-wide discovery On with aggregate (--no-repo-discover = only explicit ref candidates).
MR CI scan lookup On when an MR/PR number exists (--no-scan-search to skip).
Findings/Secrets since Not sentfull history (--no-all-time or --since-years / --since-date to bound).
Bounded window length With --no-all-time only and no --since-date: ~3 years before “until”.
Findings API dedup false (--dedup for cross-ref dedup).
Rows in HTML/JSON findings table 500 (--report-findings-limit N; 0 = no per-finding rows; summary counts unchanged).

Time window (Findings and Secrets APIs)

Lower and upper bounds are independent of “which MR” — they only control what the APIs return for the repository (then the tool filters to your refs).

Your flags Effect
(none) Full history: no since on Findings/Secrets. --until-date still caps rows client-side when set.
--no-all-time Sends since: rolling window of ~3 years before now (or before --until-date if set), unless overridden below.
--since-years N Bounded window: since = N years before “until”. Implies a bounded window even if you did not pass --no-all-time. Ignored if --since-date is set.
--since-date Calendar lower bound (YYYY-MM-DD or ISO-8601 UTC).
--until-date Upper bound (default: now). Findings with parseable timestamps after this are dropped after fetch.

Notes:

  • When since is sent, Semgrep applies it against relevant_since server-side (see API docs). This tool does not re-apply since client-side on created_at, so re-detected findings are not dropped for that reason.
  • Full history can be slow or large on huge repos; use --no-all-time, --since-years, or --discover-max-findings to limit work.

Command-line reference

Run python cli.py --help for the canonical list. Summary by topic:

Target and SCM

Flag Description
url Optional GitHub PR or GitLab MR URL.
--project Semgrep project name, usually owner/repo (required if no URL).
--mr Merge/pull request number.
--branch Branch name or full ref.
--source-branch With MR: source branch name (GitLab / ref mismatch).
--scm auto (default), github, or gitlab.

Semgrep account

Flag Description
--org-slug Org slug for UI links and synthesized per-finding URLs (https://semgrep.dev/orgs/<slug>/findings/...).
--deployment-slug Deployment to use (default: env or first available).

Findings aggregation and discovery

Flag Description
--aggregate-findings / --no-aggregate-findings Paginate Findings/Secrets for the MR or branch (default: on).
--no-repo-discover Only query explicit ref candidates; faster, easier to miss rows.
--dedup / --no-dedup Findings API dedup=true (default: off).
--discover-max-findings N Cap rows per product during repo-wide listing (0 = no cap).

Products

--products Meaning
all SAST + SCA + Secrets (default).
code SAST + SCA only.
sast SAST only.
sca SCA only.
secrets Secrets only.

CI scan history (separate from Findings DB)

Flag Description
--scan-search / --no-scan-search Search recent scan jobs (~30 days) for this MR (default: on when MR number exists).

Reporting and UI

Flag Description
--output-html HTML output path.
--no-html Do not write HTML.
--output-json Write full run report JSON.
--aggregate-json-out Write aggregate summary JSON only.
--report-findings-limit Max findings rows in HTML/JSON table (default 500; 0 = table empty, counts remain).
--no-color Disable ANSI colors.

Debugging

Flag Description
--verbose / -v Extra Findings API probe output on stderr (empty results).
--debug Verbose HTTP/pagination/filter trace on stderr.
--debug-log PATH Same as --debug, also appended to a file.

Examples

# MR URL → semgrep-report.html (full history by default)
python cli.py 'https://gitlab.com/org/repo/-/merge_requests/42'

# GitLab project + MR (no URL): set SCM explicitly
python cli.py --project org/repo --mr 42 --scm gitlab

# Branch only
python cli.py --project org/repo --branch feature/x

# Custom HTML path; code products only; 1-year lower bound
python cli.py 'https://github.com/org/repo/pull/99' \
  --output-html /tmp/out.html --products code --since-years 1

# Rolling 3-year window (explicit opt-in to bounded mode)
python cli.py 'https://gitlab.com/org/repo/-/merge_requests/42' --no-all-time

# Fixed calendar window (UTC)
python cli.py --project org/repo --mr 42 --scm gitlab \
  --since-date 2023-01-01 --until-date 2026-03-31

# Full run JSON + HTML
python cli.py 'https://gitlab.com/org/repo/-/merge_requests/42' --output-json run.json

# Deep diagnosis
python cli.py 'https://gitlab.com/org/repo/-/merge_requests/42' --verbose --debug

How collection works (short)

  1. The tool resolves ref candidates (e.g. refs/merge-requests/<iid>/head, refs/heads/<branch>).
  2. With repo-wide discovery (default), it lists Findings for the repository (and optional since), then keeps rows whose ref matches your MR/branch. --no-repo-discover skips the wide listing and only hits those candidate refs.
  3. Secrets are fetched with compatible time filters and merged into the same report model.
  4. CI scan lookup is optional metadata from scan search; it does not replace Findings.

Limitations

  • ListFindings prefers repository_ids; naming fallbacks apply when IDs are missing.
  • Zero findings in the report but issues in the UI: try --verbose, --source-branch, --dedup, confirm --org-slug, and see Time window above.
  • CI scan search is only ~30 days; findings can be much older.
  • Semgrep may omit some per-finding URLs; the report uses whatever fields exist (e.g. line_of_code_url).

Project files

cli.py · cli_term.py · semgrep_client.py · parse_input.py · ref_candidates.py · aggregate_mr_findings.py · findings_multiproduct.py · report_export.py · debug_log.py


Exit codes

Code Meaning
0 Success
2 Usage error (bad arguments, missing token, invalid combinations)
3 Not found (e.g. deployment or project not visible to the token)

About

CLI: Semgrep MR/branch findings to HTML report

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages