feat(hfh):SP-4181 implement raw output format for folder hashing by agustingroh · Pull Request #200 · scanoss/scanoss.py

agustingroh · 2026-03-24T15:28:55Z

Summary by CodeRabbit

New Features
- Added a raw output format for the folder-scan command (in addition to json and cyclonedx). raw emits per-file snippet-compatible JSON, expands directory-level matches into file-level entries, picks the most specific match per file, and includes a computed MD5 file hash. Default output remains json.
Documentation
- CHANGELOG updated to describe the new raw format and its per-file output.

github-actions · 2026-03-24T15:29:38Z

SCANOSS SCAN Completed 🚀

Detected components: 2
Undeclared components: 0
Declared components: 2
Detected files: 111
Detected files undeclared: 0
Detected files declared: 111
Licenses detected: 1
Licenses detected with copyleft: 0
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

coderabbitai · 2026-03-24T15:34:09Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Added a raw output format to the folder-scan CLI and implemented HFH scanner raw output: directory-level HFH matches are expanded into per-file snippet-scanner JSON entries keyed by relative path, computing MD5 file_hash, mapping file URLs, and assigning each file to the most-specific path_id.

Changes

Cohort / File(s)	Summary
CLI Format Option `src/scanoss/cli.py`	Added `raw` to the `--format/-f` choices for the `folder-scan` subcommand (`json`, `cyclonedx`, `raw`).
HFH Scanner Raw Output Implementation `src/scanoss/scanners/scanner_hfh.py`	Added `config` parameter to `ScannerHFHPresenter.__init__`; implemented `_format_raw_output()` to emit snippet-scanner JSON by expanding directory-level HFH matches into per-file entries; added helpers `_extract_best_components`, `_file_matches_path_id`, `_compute_file_md5`, `_build_snippet_entry`; ensures deepest (most-specific) `path_id` wins, computes MD5 (empty on read failure), and sets `file_url` using scanner client origin when available.
Changelog `CHANGELOG.md`	Documented the new `--format raw` option and described the raw/snippet-scanner output behavior (per-file expansion, MD5, path_id selection).

Sequence Diagram

sequenceDiagram
    actor User
    participant CLI as CLI Handler
    participant HFH as HFH Scanner
    participant Presenter as HFH Presenter
    participant FS as File System

    User->>CLI: folder-scan --format raw
    CLI->>HFH: run scan (raw)
    HFH->>HFH: perform HFH scan -> results
    HFH->>Presenter: _format_raw_output(results, scan_root)
    Presenter->>Presenter: _extract_best_components(results)
    loop per path_id (deepest-first)
        Presenter->>FS: list files under scan_root matching path_id
        FS-->>Presenter: file list
        loop per matched file
            Presenter->>FS: read file bytes
            FS-->>Presenter: file bytes
            Presenter->>Presenter: _compute_file_md5(bytes)
            Presenter->>Presenter: _build_snippet_entry(file, component, version, md5)
        end
    end
    Presenter-->>HFH: aggregated JSON (relpath -> entries)
    HFH-->>User: output raw JSON

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

[SP-2870] fix: cyclonedx format output not writing to file #130: Modifies ScannerHFHPresenter in the same file; related to presenter and output handling.
[SP-2587] Add directory simhash, modify concatenated names to remove extensions #119: Adjusts ScannerHFH/presenter behavior and output formatting; touches similar conversion logic.
[SP-2991] feat: add recursive search support for folder hashing #141: Alters HFH scanning implementation and parameters; related to scanner flow changes.

Suggested labels

enhancement

Suggested reviewers

eeisegn

Poem

🐰 I hopped through folders, sniffed each file,
I mapped each byte and grinned a while.
MD5 on fur, JSON in paw—so neat,
Per-file snippets ready, tidy and complete.
Hooray, the scanner’s done a cheerful feat! 🎉

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title clearly describes the main change: implementing a raw output format for the HFH (folder hashing) feature, which is the primary objective across all modified files.
Docstring Coverage	✅ Passed	Docstring coverage is 88.89% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/SP-4181-raw-output-format-hfh

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-24T15:34:32Z

SCANOSS SCAN Completed 🚀

Detected components: 2
Undeclared components: 0
Declared components: 2
Detected files: 111
Detected files undeclared: 0
Detected files declared: 111
Licenses detected: 1
Licenses detected with copyleft: 0
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

coderabbitai

🧹 Nitpick comments (1)

src/scanoss/scanners/scanner_hfh.py (1)
347-364: Consider adding usedforsecurity=False for FIPS-compliant environments.

On FIPS-enabled systems (Python 3.9+), hashlib.md5() without usedforsecurity=False can raise a ValueError. Since this hash is used for file identification (not cryptographic security), explicitly marking it as such improves compatibility.
♻️ Suggested change
     try:
-        return hashlib.md5(file_path.read_bytes()).hexdigest()
+        return hashlib.md5(file_path.read_bytes(), usedforsecurity=False).hexdigest()
     except (OSError, IOError):
         return ''
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/scanoss/scanners/scanner_hfh.py` around lines 347 - 364, Change
_compute_file_md5 to call hashlib.md5 with usedforsecurity=False when
constructing the hasher and ensure ValueError is handled: replace the current
hashlib.md5(file_path.read_bytes()).hexdigest() call with
hashlib.md5(file_path.read_bytes(), usedforsecurity=False).hexdigest() and
include ValueError in the except clause alongside OSError/IOError so that
FIPS-enabled Python 3.9+ environments won't raise uncaught exceptions and the
function still returns '' on failure.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/scanoss/scanners/scanner_hfh.py`:
- Around line 347-364: Change _compute_file_md5 to call hashlib.md5 with
usedforsecurity=False when constructing the hasher and ensure ValueError is
handled: replace the current hashlib.md5(file_path.read_bytes()).hexdigest()
call with hashlib.md5(file_path.read_bytes(), usedforsecurity=False).hexdigest()
and include ValueError in the except clause alongside OSError/IOError so that
FIPS-enabled Python 3.9+ environments won't raise uncaught exceptions and the
function still returns '' on failure.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf2e7adb-2d6d-456c-9d53-a5baf55ddc80

📥 Commits

Reviewing files that changed from the base of the PR and between 7974777 and e1c914c.

📒 Files selected for processing (2)

src/scanoss/cli.py
src/scanoss/scanners/scanner_hfh.py

github-actions · 2026-03-24T16:07:51Z

SCANOSS SCAN Completed 🚀

Detected components: 2
Undeclared components: 0
Declared components: 2
Detected files: 110
Detected files undeclared: 0
Detected files declared: 110
Licenses detected: 1
Licenses detected with copyleft: 0
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

github-actions · 2026-03-24T16:49:43Z

SCANOSS SCAN Completed 🚀

Detected components: 2
Undeclared components: 0
Declared components: 2
Detected files: 110
Detected files undeclared: 0
Detected files declared: 110
Licenses detected: 1
Licenses detected with copyleft: 0
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/scanoss/scanners/scanner_hfh.py (1)

359-376: Optional: Consider chunked reading for very large files

The current implementation matches winnowing.py's approach of computing MD5 on full file content at once. If large files are a concern, chunked reading is worth considering for memory efficiency:

♻️ Optional: chunked MD5 computation

     `@staticmethod`
     def _compute_file_md5(file_path: Path) -> str:
         try:
-            return hashlib.md5(file_path.read_bytes()).hexdigest()
+            md5_hash = hashlib.md5()
+            with open(file_path, 'rb') as f:
+                for chunk in iter(lambda: f.read(8192), b''):
+                    md5_hash.update(chunk)
+            return md5_hash.hexdigest()
         except (OSError, IOError):
             return ''

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/scanoss/scanners/scanner_hfh.py` around lines 359 - 376, The
_compute_file_md5 function currently reads the entire file into memory with
file_path.read_bytes(), which can OOM on very large files; change it to perform
chunked reads using file_path.open('rb') and iteratively update a hashlib.md5()
object (e.g., read in 64KB or 1MB chunks and call md5.update(chunk)) so behavior
matches winnowing.py but is memory-efficient, and keep the existing exception
handling that returns '' on OSError/IOError; reference _compute_file_md5 and the
winnowing.py MD5 approach when applying the change.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scanoss/scanners/scanner_hfh.py`:
- Around line 339-357: The matching logic in _file_matches_path_id uses os.sep
and can fail when path_id uses forward slashes from the API; update the function
to normalize and compare using pathlib.PurePath: create PurePath objects for
file_path and path_id (e.g., file_p = PurePath(file_path), path_p =
PurePath(path_id)) and return True if path_id is '.' or file_p == path_p or
path_p in file_p.parents, ensuring cross-platform correctness without relying on
os.sep.
- Around line 285-289: The sorting and matching use os.sep so paths from HFH
(which use forward slashes) are mis-counted and don't match; update the sorting
of sorted_path_ids (the sorted(..., key=lambda p: ...)) to compute depth by
counting both '/' and os.sep or normalize path_id to use '/' consistently, and
also change the file-matching logic that checks path_id + os.sep (and references
all_files) to use the same normalized separator (e.g., '/' ) or normalize
all_files entries before comparison so path_components keys and file paths use
the same separator format; ensure you update both the lambda key used for
sorted_path_ids and the matching condition that builds path prefix checks so
they are consistent cross-platform.

---

Nitpick comments:
In `@src/scanoss/scanners/scanner_hfh.py`:
- Around line 359-376: The _compute_file_md5 function currently reads the entire
file into memory with file_path.read_bytes(), which can OOM on very large files;
change it to perform chunked reads using file_path.open('rb') and iteratively
update a hashlib.md5() object (e.g., read in 64KB or 1MB chunks and call
md5.update(chunk)) so behavior matches winnowing.py but is memory-efficient, and
keep the existing exception handling that returns '' on OSError/IOError;
reference _compute_file_md5 and the winnowing.py MD5 approach when applying the
change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f000dc68-5256-40b8-aeb8-9ea7063086f4

📥 Commits

Reviewing files that changed from the base of the PR and between 23636aa and 159dda6.

📒 Files selected for processing (3)

CHANGELOG.md
src/scanoss/cli.py
src/scanoss/scanners/scanner_hfh.py

✅ Files skipped from review due to trivial changes (2)

CHANGELOG.md
src/scanoss/cli.py

src/scanoss/scanners/scanner_hfh.py

github-actions · 2026-03-24T18:40:01Z

SCANOSS SCAN Completed 🚀

Detected components: 2
Undeclared components: 0
Declared components: 2
Detected files: 110
Detected files undeclared: 0
Detected files declared: 110
Licenses detected: 1
Licenses detected with copyleft: 0
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

src/scanoss/scanners/scanner_hfh.py (2)

369-372: ⚠️ Potential issue | 🟠 Major

Same cross-platform path separator issue in matching logic.

Line 372 uses os.sep for the prefix check, which fails when path_id uses forward slashes (from API) but file_path uses backslashes (Windows filesystem).

🐛 Proposed fix using pathlib

     `@staticmethod`
     def _file_matches_path_id(file_path: str, path_id: str) -> bool:
-        """
-        Check if a file path belongs under a given path_id directory.
-
-        Both file_path and path_id are relative to the scan root directory.
-        A path_id of '.' matches all files (root directory).
-
-        Args:
-            file_path (str): Relative file path from the scan root.
-            path_id (str): Relative directory path from the HFH result.
-
-        Returns:
-            bool: True if the file is under the given path_id directory.
-        """
+        """Check if a file path belongs under a given path_id directory."""
         if path_id == '.':
             return True
-        # file_path and path_id are both relative to scan_dir
-        return file_path == path_id or file_path.startswith(path_id + os.sep)
+        # Normalize to forward slashes for consistent comparison
+        norm_file = file_path.replace(os.sep, '/')
+        norm_path = path_id.replace(os.sep, '/')
+        return norm_file == norm_path or norm_file.startswith(norm_path + '/')

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/scanoss/scanners/scanner_hfh.py` around lines 369 - 372, The matching
uses os.sep which breaks on Windows when path_id comes from the API with forward
slashes; to fix, normalize both file_path and path_id to path objects and
compare by path parts instead of string prefixes: convert file_path and path_id
to pathlib.Path (or PurePath) relative to scan_dir and then return True if
path_parts == id_parts or path_parts startswith id_parts (i.e.
file_parts[:len(id_parts)] == id_parts); keep the existing special-case for
path_id == '.' and update the code around the file_path/path_id comparison to
use these Path-based comparisons.

300-304: ⚠️ Potential issue | 🟠 Major

Fix cross-platform path separator mismatch in sorting.

The HFH API returns path_id with forward slashes (e.g., 'project-1.0/src/lib'), but the sorting uses os.sep which is \\ on Windows. This causes incorrect depth calculation on Windows, breaking the "deepest first" sort order.

🐛 Proposed fix

         sorted_path_ids = sorted(
             path_components.keys(),
-            key=lambda p: (-1, 0) if p == '.' else (p.count(os.sep), len(p)),
+            key=lambda p: (-1, 0) if p == '.' else (p.count('/'), len(p)),
             reverse=True,
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/scanoss/scanners/scanner_hfh.py` around lines 300 - 304, The sorting of
path_components into sorted_path_ids uses os.sep to count path depth, which
fails on Windows because HFH path_id uses forward slashes; update the key in the
sorted call (the lambda used to build the sort key for sorted_path_ids) to
compute depth using the HFH path separator '/' (or use posixpath functions)
instead of os.sep, keeping the special-case for '.' and preserving the "deepest
first" ordering and tie-breaker on length.

🧹 Nitpick comments (1)

src/scanoss/scanners/scanner_hfh.py (1)

170-181: Remove unused config parameter or store it if needed.

The config parameter is accepted but never assigned to self. The docstring claims it's used for "API base URL for constructing file_url", but line 321 actually uses self.scanner.client.orig_url instead. Either remove the unused parameter or store it if it's intended for future use.

🧹 Proposed fix to remove unused parameter

-    def __init__(self, scanner: ScannerHFH, config: ScannerConfig = None, **kwargs):
+    def __init__(self, scanner: ScannerHFH, **kwargs):
         """
         Initialize the presenter.

         Args:
             scanner (ScannerHFH): The HFH scanner instance containing scan results and file filters.
-            config (ScannerConfig, optional): Scanner configuration, used to access the API base URL
-                for constructing file_url in raw output format.
             **kwargs: Additional arguments passed to AbstractPresenter (debug, trace, quiet, etc.).
         """
         super().__init__(**kwargs)
         self.scanner = scanner

And update the instantiation at line 90-96:

         self.presenter = ScannerHFHPresenter(
             self,
             debug=config.debug,
             trace=config.trace,
             quiet=config.quiet,
-            config=config,
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/scanoss/scanners/scanner_hfh.py` around lines 170 - 181, The __init__ of
the presenter accepts a config param but never stores or uses it; either remove
the unused parameter from the signature (and update all instantiation sites of
the presenter) or persist it (self.config = config) and replace uses of
self.scanner.client.orig_url when constructing file_url with the intended config
value (e.g., self.config.api_base_url) so the docstring is accurate; update the
constructor in the class __init__ and any call sites that pass config
accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/scanoss/scanners/scanner_hfh.py`:
- Around line 393-436: The _build_snippet_entry function currently has an overly
long line; an unused variable score; and hardcodes 'matched' to '100%'.
Shorten/reflow any lines >120 chars (split long docstring lines or wrap long
dict value expressions) to satisfy the line-length rule, and use the existing
score variable to populate the matched field (e.g., set matched to a string
derived from score like f"{score}%" or otherwise format score appropriately) so
score is no longer unused; ensure you reference _build_snippet_entry and the
variables score and matched when making the change.
- Around line 318-322: Guard against a None scanner client and wrap long line:
when constructing entry in the scanner_hfh logic (around claimed_files,
self._compute_file_md5 and self._build_snippet_entry), replace the direct access
to self.scanner.client.orig_url with a safe expression that uses a fallback
(e.g., (self.scanner.client.orig_url if self.scanner.client else '')) and break
the long statement into multiple shorter lines so it stays under 120 characters;
ensure you still pass the derived file_hash from _compute_file_md5 and the
component/best_version to _build_snippet_entry.

---

Duplicate comments:
In `@src/scanoss/scanners/scanner_hfh.py`:
- Around line 369-372: The matching uses os.sep which breaks on Windows when
path_id comes from the API with forward slashes; to fix, normalize both
file_path and path_id to path objects and compare by path parts instead of
string prefixes: convert file_path and path_id to pathlib.Path (or PurePath)
relative to scan_dir and then return True if path_parts == id_parts or
path_parts startswith id_parts (i.e. file_parts[:len(id_parts)] == id_parts);
keep the existing special-case for path_id == '.' and update the code around the
file_path/path_id comparison to use these Path-based comparisons.
- Around line 300-304: The sorting of path_components into sorted_path_ids uses
os.sep to count path depth, which fails on Windows because HFH path_id uses
forward slashes; update the key in the sorted call (the lambda used to build the
sort key for sorted_path_ids) to compute depth using the HFH path separator '/'
(or use posixpath functions) instead of os.sep, keeping the special-case for '.'
and preserving the "deepest first" ordering and tie-breaker on length.

---

Nitpick comments:
In `@src/scanoss/scanners/scanner_hfh.py`:
- Around line 170-181: The __init__ of the presenter accepts a config param but
never stores or uses it; either remove the unused parameter from the signature
(and update all instantiation sites of the presenter) or persist it (self.config
= config) and replace uses of self.scanner.client.orig_url when constructing
file_url with the intended config value (e.g., self.config.api_base_url) so the
docstring is accurate; update the constructor in the class __init__ and any call
sites that pass config accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 325cc9e5-6356-4ce0-92ec-baf2f33a0212

📥 Commits

Reviewing files that changed from the base of the PR and between 159dda6 and 3ef207e.

📒 Files selected for processing (3)

CHANGELOG.md
src/scanoss/cli.py
src/scanoss/scanners/scanner_hfh.py

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.md

src/scanoss/scanners/scanner_hfh.py

github-actions · 2026-03-24T18:54:57Z

SCANOSS SCAN Completed 🚀

Detected components: 2
Undeclared components: 0
Declared components: 2
Detected files: 110
Detected files undeclared: 0
Detected files declared: 110
Licenses detected: 1
Licenses detected with copyleft: 0
Policies: ✅ 1 pass (1 total)

View more details on SCANOSS Action Summary

coderabbitai

🧹 Nitpick comments (2)

src/scanoss/scanners/scanner_hfh.py (2)

394-438: Consider using the actual HFH score for the matched field.

The matched field is hardcoded to '100%' (line 421), but best_version contains the actual HFH score that would provide more accurate match information to downstream consumers.

♻️ Proposed enhancement

     def _build_snippet_entry(
         component: Dict, best_version: Dict, file_path: str, file_hash: str, base_url: str,
     ) -> Dict:
         ...
         purl = component.get('purl', '')
+        score = best_version.get('score', 1.0)
         version = best_version.get('version', '')

         url = purl2url.get_repo_url(purl) if purl else ''
         return {
             'id': 'file',
-            'matched': '100%',
+            'matched': f'{int(score * 100)}%',
             'purl': [purl],
             ...
         }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/scanoss/scanners/scanner_hfh.py` around lines 394 - 438, In
_build_snippet_entry, replace the hardcoded 'matched': '100%' value with the
actual HFH score from best_version (e.g., best_version.get('score')) formatted
as a percent string (e.g., f"{score}%" or similar), falling back to '100%' when
score is missing or not numeric; ensure you reference the best_version dict and
coerce the score to a string with a trailing '%' before returning the dict so
downstream consumers receive the real HFH match percentage.

170-181: Remove or use the unused config parameter.

The config parameter is accepted but never stored or used. Line 181 only stores self.scanner, and _format_raw_output() at line 321 accesses self.scanner.client.orig_url rather than anything from config.

Either store config as an instance attribute if needed for future use, or remove it to avoid confusion:

♻️ Option 1: Remove unused parameter

-    def __init__(self, scanner: ScannerHFH, config: ScannerConfig = None, **kwargs):
+    def __init__(self, scanner: ScannerHFH, **kwargs):
         """
         Initialize the presenter.

         Args:
             scanner (ScannerHFH): The HFH scanner instance containing scan results and file filters.
-            config (ScannerConfig, optional): Scanner configuration, used to access the API base URL
-                for constructing file_url in raw output format.
             **kwargs: Additional arguments passed to AbstractPresenter (debug, trace, quiet, etc.).
         """
         super().__init__(**kwargs)
         self.scanner = scanner

And update the caller at line 95:

         self.presenter = ScannerHFHPresenter(
             self,
             debug=config.debug,
             trace=config.trace,
             quiet=config.quiet,
-            config=config,
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/scanoss/scanners/scanner_hfh.py` around lines 170 - 181, The __init__
signature for the presenter currently accepts a config: ScannerConfig parameter
that is never used; either remove the unused parameter from __init__ (and update
callers that pass config) or persist it as an instance attribute (self.config =
config) so future methods can rely on it; if choosing to keep it, also update
any references that should use config (e.g. in _format_raw_output) instead of
accessing self.scanner.client.orig_url, and ensure the ScannerHFH
constructor/call sites are updated accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/scanoss/scanners/scanner_hfh.py`:
- Around line 394-438: In _build_snippet_entry, replace the hardcoded 'matched':
'100%' value with the actual HFH score from best_version (e.g.,
best_version.get('score')) formatted as a percent string (e.g., f"{score}%" or
similar), falling back to '100%' when score is missing or not numeric; ensure
you reference the best_version dict and coerce the score to a string with a
trailing '%' before returning the dict so downstream consumers receive the real
HFH match percentage.
- Around line 170-181: The __init__ signature for the presenter currently
accepts a config: ScannerConfig parameter that is never used; either remove the
unused parameter from __init__ (and update callers that pass config) or persist
it as an instance attribute (self.config = config) so future methods can rely on
it; if choosing to keep it, also update any references that should use config
(e.g. in _format_raw_output) instead of accessing self.scanner.client.orig_url,
and ensure the ScannerHFH constructor/call sites are updated accordingly.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d146b545-8398-42e3-beb6-0f390ff5249b

📥 Commits

Reviewing files that changed from the base of the PR and between 3ef207e and 46158c4.

📒 Files selected for processing (3)

CHANGELOG.md
src/scanoss/cli.py
src/scanoss/scanners/scanner_hfh.py

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.md

agustingroh force-pushed the feat/SP-4181-raw-output-format-hfh branch from ed33fe3 to e1c914c Compare March 24, 2026 15:33

agustingroh requested a review from isasmendiagus March 24, 2026 15:34

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

agustingroh force-pushed the feat/SP-4181-raw-output-format-hfh branch from e1c914c to 23636aa Compare March 24, 2026 16:07

agustingroh force-pushed the feat/SP-4181-raw-output-format-hfh branch from 23636aa to 159dda6 Compare March 24, 2026 16:48

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

src/scanoss/scanners/scanner_hfh.py Show resolved Hide resolved

src/scanoss/scanners/scanner_hfh.py Show resolved Hide resolved

agustingroh force-pushed the feat/SP-4181-raw-output-format-hfh branch from 159dda6 to 3ef207e Compare March 24, 2026 18:39

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

src/scanoss/scanners/scanner_hfh.py Show resolved Hide resolved

src/scanoss/scanners/scanner_hfh.py Show resolved Hide resolved

feat(hfh):SP-4181 implement raw output format for folder hashing

46158c4

agustingroh force-pushed the feat/SP-4181-raw-output-format-hfh branch from 3ef207e to 46158c4 Compare March 24, 2026 18:54

coderabbitai bot reviewed Mar 24, 2026

View reviewed changes

Conversation

agustingroh commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

github-actions bot commented Mar 24, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions bot commented Mar 24, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 24, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

github-actions bot commented Mar 24, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 24, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Mar 24, 2026

SCANOSS SCAN Completed 🚀

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

agustingroh commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading