Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion docs/BENCHMARKS.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The chart below uses a log-log scatter plot: file count on the x-axis, wall-cloc

![Scan duration vs. file count for Provenant and ScanCode](benchmarks/scan-duration-vs-files.svg)

> Provenant is faster on 130 of 132 recorded runs, with a **11.5× median speedup** and **9.9× geometric-mean speedup** overall; the median gap grows from **6.4×** on sub-100-file targets to **19.7×** on 10k+ file targets.
> Provenant is faster on 133 of 135 recorded runs, with a **11.7× median speedup** and **10.1× geometric-mean speedup** overall; the median gap grows from **6.4×** on sub-100-file targets to **19.7×** on 10k+ file targets.
> Generated from the benchmark timing rows in this document via `cargo run --manifest-path xtask/Cargo.toml --bin generate-benchmark-chart`.

## Current benchmark examples
Expand Down Expand Up @@ -184,6 +184,9 @@ The tables below provide the per-target detail behind the chart. Each row is one
| [nix-community/dream2nix @ 69eb01f](https://github.com/nix-community/dream2nix/tree/69eb01fa0995e1e90add49d8ca5bcba213b0416f)<br>515 files | 2026-04-12 · dream2nix-60485 · macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc | Provenant: 19.91s<br>ScanCode: 33.50s<br>**1.68× faster (-40.6%)** | Broader Nix package and dependency extraction (`53` vs `22` packages, `887` vs `843` dependencies) from committed `flake.lock` inputs and flake-compat-backed `default.nix` wrapper surfaces across the tree, with cleaner root-package visibility on repository entrypoints that ScanCode leaves unassembled |
| [NixOS/nix @ 262e98f](https://github.com/NixOS/nix/tree/262e98f67e09f83393dc84c2629df84cce2fe299)<br>2,889 files | 2026-04-11 · nix-94957 · macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc | Provenant: 21.86s<br>ScanCode: 104.41s<br>**4.78× faster (-79.1%)** | Broader Nix package and dependency extraction (`2` vs `0` packages, `67` vs `0` dependencies) from committed `flake.lock` inputs and Nix manifest surfaces across the tree, plus safer URL credential stripping and Unicode-preserving author normalization across release-note metadata |
| [numtide/devshell @ 255a2b1](https://github.com/numtide/devshell/tree/255a2b1725a20d060f566e4755dbf571bbbb5f76)<br>84 files | 2026-04-12 · devshell-83906 · macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc | Provenant: 10.57s<br>ScanCode: 37.57s<br>**3.55× faster (-71.9%)** | Broader Nix package and dependency extraction (`5` vs `0` packages, `17` vs `0` dependencies) from committed `flake.lock` inputs, root `default.nix`, and template flake surfaces, with cleaner root-package visibility on flake-compat-backed entrypoints that ScanCode leaves unassembled |
| [ocaml/dune @ b13ab94](https://github.com/ocaml/dune/tree/b13ab949e185a205a39eb6163eea050b7d60a047)<br>7,751 files | 2026-04-22 · dune-32635 · macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc | Provenant: 20.74s<br>ScanCode: 519.01s<br>**25.02× faster (-96.0%)** | Broader opam and Nix package visibility (`4` vs `2` packages, `130` vs `116` dependencies) from the generated `opam/*.opam` manifests and `flake.lock`, with structured opam description, maintainer, and dependency recovery instead of ScanCode's field-bleeding author text on those manifests |
| [ocaml/merlin @ 30b4f24](https://github.com/ocaml/merlin/tree/30b4f24fdd76fdbf32685aac73de7fd4a6ff7470)<br>2,120 files | 2026-04-22 · merlin-47624 · macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc | Provenant: 31.93s<br>ScanCode: 656.13s<br>**20.55× faster (-95.1%)** | Direct opam package visibility (`1` vs `0` packages) with broader dependency extraction (`27` vs `24`) from the repo-root `merlin*.opam`, `dot-merlin-reader.opam`, `ocaml-index.opam`, and `flake.lock` surfaces, plus Unicode-preserving copyright normalization across the Merlin source tree |
| [ocaml/ocaml-lsp @ 788ff73](https://github.com/ocaml/ocaml-lsp/tree/788ff738991189537141776bfa07652547bff9c4)<br>546 files | 2026-04-22 · ocaml-lsp-41966 · macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc | Provenant: 13.83s<br>ScanCode: 185.33s<br>**13.40× faster (-92.5%)** | Broader opam package visibility (`3` vs `1` packages) with slightly richer dependency extraction (`380` vs `376`) from the root and submodule `.opam` manifests plus `flake.lock`, with cleaner maintainer and email recovery on opam metadata and Unicode-preserving copyright normalization |
| [univention/Nubus @ fef2258](https://github.com/univention/Nubus/tree/fef2258483c56cce0e1f14e4c8d8fce24d26b891)<br>16 files | 2026-04-19 · Nubus-321 · macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 10 proc | Provenant: 10.53s<br>ScanCode: 72.03s<br>**6.84× faster (-85.4%)** | Direct `publiccode.yml` package visibility on the root metadata file (`1` vs `0` on that file), with cleaner SPDX copyright placeholder normalization for `Univention GmbH` and the same zero-scan-error behavior under the shared profile |
| [yesodweb/yesod @ 1b033c7](https://github.com/yesodweb/yesod/tree/1b033c741ce81d01070de993b285a17e71178156)<br>324 files | 2026-04-17 · yesod-71400 · macOS 26.3.1 · Apple M1 Max · 32 GB · arm64 · 9 proc | Provenant: 10.62s<br>ScanCode: 99.03s<br>**9.32× faster (-89.3%)** | Broader multi-package Hackage extraction (`16` vs `0` packages, `391` vs `0` dependencies) from the repo's many sibling `yesod-*/*.cabal` manifests, with explicit package identities across the Yesod family where ScanCode stays manifest-blind |

Expand Down
18 changes: 18 additions & 0 deletions docs/benchmarks/scan-duration-vs-files.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ The ranking below is ordered by **practical verification value first**: broad ec
| 32 | Linux Distro (`os-release`) | ⚪ Planned | Debian base-image rootfs snapshot<br>Fedora base-image rootfs snapshot<br>Distroless `base-debian12` rootfs snapshot | This row is rootfs-only on purpose. Debian and Fedora give conventional distro metadata layouts, while Distroless shows the minimal-image case where `os-release` may be one of the few package-identity signals present. Watch path/layout differences and do not treat intentionally sparse distroless metadata as a parser regression by itself. |
| 33 | AboutCode | ⚪ Planned | `aboutcode-org/scancode-toolkit` (10k–50k files)<br>`aboutcode-org/scancode.io` (500–2k files)<br>`aboutcode-org/dejacode` (500–2k files) | Niche but very high-fit `.ABOUT` lane. `aboutcode-org/scancode-toolkit` is the broadest real-world `.ABOUT` reference, while `aboutcode-org/scancode.io` and `aboutcode-org/dejacode` provide smaller product-style contrasts. Watch `.ABOUT` extraction staying visible beside denser package, README, and license output in these application trees. |
| 34 | Hex / Elixir | ⚪ Planned | `phoenixframework/phoenix` (500–2k files)<br>`elixir-ecto/ecto` (500–2k files)<br>`elixir-plug/plug` (<500 files) | Useful ecosystem, but current Rust scope is still the lockfile/static subset, so this ranks below the broader mainstream families. |
| 35 | OCaml / opam | ⚪ Planned | `ocaml/dune` (500–2k files)<br>`ocaml/ocaml-lsp` (500–2k files)<br>`ocaml/merlin` (500–2k files) | Good `opam` coverage, but lower practical verification priority than the broader ecosystems above. |
| 35 | OCaml / opam | 🟢 Verified | `ocaml/dune` (500–2k files)<br>`ocaml/ocaml-lsp` (500–2k files)<br>`ocaml/merlin` (500–2k files) | Good `opam` coverage, but lower practical verification priority than the broader ecosystems above. |
| 36 | Buck | 🟢 Verified | `facebook/buck2` (2k–10k files)<br>`facebook/watchman` (500–2k files)<br>`facebook/react-native` (10k–50k files) | Real Buck lane, even if narrower than Bazel in practice. `facebook/buck2` is the canonical direct reference, `facebook/watchman` is a smaller focused contrast, and `facebook/react-native` adds a large mixed-language consumer tree. Watch Buck metadata separately from the rest of the monorepo so unrelated JS/native/common-profile noise does not hide actual build-metadata gaps. |
| 37 | FreeBSD | ⚪ Planned | FreeBSD `pkg` package archive sample<br>FreeBSD `bash` package archive sample<br>FreeBSD `curl` package archive sample | Important artifact-family support, but narrower day-to-day scan prevalence than the higher-priority distro lanes. |
| 38 | Chef | ⚪ Planned | `sous-chefs/apache2` (<500 files)<br>`sous-chefs/mysql` (<500 files)<br>`chef/chef` (2k–10k files) | Worth covering, but lower priority than the mainstream language and distro families. |
Expand Down
Loading
Loading