Add experiments for preprocessing determinism and Merkle verification by tani-dubey · Pull Request #66 · AOSSIE-Org/OpenVerifiableLLM

tani-dubey · 2026-03-11T08:59:01Z

Addressed Issues:

Fixes #37

This PR adds experiments/ directory containing small reproducible experiments to validate assumptions about the OpenVerifiableLLM pipeline.

It has experiments :

preprocessing determinism experiment
Verifies that running the preprocessing pipeline multiple times on the same dataset produces identical artifacts (processed_sha256, processed_merkle_root, environment_hash).
Merkle root tamper detection experiment
Demonstrates how dataset modifications can be detected using Merkle roots.
small dataset subset for testing
Provides a lightweight Wikipedia XML example to allow experiments to run quickly without requiring large dumps.
experiments README with usage instructions

Additional Notes:

While implementing the tamper detection experiment, I observed that the processed merkle_root alone is unable to show tamper for small change ( like removing or adding a single alphabet from sample.bz2 goes undetected).

To address this, the experiment compares both:

raw_merkle_root
processed_merkle_root

Checklist

My code follows the project's code style and conventions
I have made corresponding changes to the documentation
My changes generate no new warnings or errors
I have joined the Discord server and I will share a link to this PR with the project maintainers there
I have read the Contributing Guidelines

⚠️ AI Notice - Important!

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.

Summary by CodeRabbit

Documentation
- Added a comprehensive experiments guide with setup, quick-start commands, directory layout, and expected output formats.
New Features
- Added experiment scripts for deterministic preprocessing checks, tamper-detection via cryptographic root comparison, and a generated tampered dataset sample for testing.

coderabbitai · 2026-03-11T08:59:21Z

Walkthrough

Adds an experiments/ directory with README and three CLI experiment scripts to validate preprocessing determinism, Merkle root tamper detection, and a tampered Wikipedia sample generator; scripts read/produce dataset manifests and compare hashes/roots for verification.

Changes

Cohort / File(s)	Summary
Documentation `experiments/README.md`	New README documenting experiments layout, three experiments (Preprocessing Determinism, Merkle Root Tamper Detection, Dataset Subset), run commands, expected manifest fields (`processed_sha256`, `processed_merkle_root`, `raw_merkle_root`, `environment_hash`).
Determinism Test `experiments/preprocessing_determinism/test_preprocessing.py`	New CLI script that runs preprocessing twice on an XML input, reads `data/dataset_manifest.json`, and compares `processed_sha256`, `processed_merkle_root`, and `environment_hash` for determinism.
Merkle Verification `experiments/merkle_verification/test_merkle.py`	New CLI script that preprocesses inputs (via `extract_text_from_xml`), loads manifest(s), prints `raw_merkle_root` and `processed_merkle_root` for two runs, and reports tampering if roots differ.
Tampered Sample Generator `experiments/data_subset/sample_wiki_generate.py`	New script that writes a bz2-compressed small MediaWiki XML sample with intentional tampering for dataset subset experiments.

Sequence Diagram(s)

sequenceDiagram
  participant User as User (CLI)
  participant Preproc as Preprocessor (extract_text_from_xml)
  participant FS as Filesystem / Manifest (data/dataset_manifest.json)
  participant Comparator as Comparator

  User->>Preproc: provide dataset path
  Preproc->>FS: write/read processed artifacts + manifest
  FS-->>Preproc: return manifest with raw_merkle_root, processed_merkle_root
  Preproc->>User: return manifest fields
  User->>Comparator: provide two manifest results
  Comparator->>Comparator: compare raw_merkle_root and processed_merkle_root
  Comparator-->>User: report "tampering detected" or "identical"

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

feat: add deterministic preprocessing and dataset identity tracking #2: Adds/updates utility functions (e.g., extract_text_from_xml) and manifest generation used by the new experiment scripts.
feat: deterministic tokenizer training and config hashing #17: Defines the manifest structure and fields (processed_sha256, processed_merkle_root, environment_hash) that experiment scripts depend on.

Suggested labels

Python Lang, Documentation

Suggested reviewers

Archit381

Poem

🐰 I hopped through manifests and roots so fine,
I nibbled a wiki, removed one little line.
I checked the hashes, compared them twice,
Found tamper tracks and said, "How nice!"
Hooray for experiments — crunchy, cryptic, divine.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately reflects the main changes: adding experiments for preprocessing determinism and Merkle verification, which are central to this PR.
Linked Issues check	✅ Passed	The PR implements core objectives from issue `#37`: creates an experiments/ directory with three functional experiments (preprocessing determinism, Merkle tamper detection, dataset subset) plus README, validating determinism and verification assumptions as requested.
Out of Scope Changes check	✅ Passed	All changes are scoped to the experiments/ directory and directly support the three proposed experiment categories from issue `#37`; no extraneous modifications to main codebase detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@experiments/data_subset/sample_wiki_generate.py`:
- Line 3: Fix the minor typo in the inline comment in sample_wiki_generate.py:
replace the current comment "# To make this tampered I deleted e of online" with
a clearer phrasing such as "# To make this tampered I deleted the 'e' in
'online'." to correct grammar and clarify intent; update the comment near the
tampering logic or the top of the file where this sentence appears.
- Around line 19-20: The file currently performs file I/O (the with
bz2.open(...) block that writes xml_content to
"experiments/data_subset/tampered_sample_wiki.xml.bz2") at import time; move
that write block into an if __name__ == "__main__": guard so the write only runs
when the script is executed directly, ensuring xml_content is still defined or
constructed above the guard (or by a helper function you call from the guard)
and preventing side effects on import.

In `@experiments/merkle_verification/test_merkle.py`:
- Line 22: Update the comment in experiments/merkle_verification/test_merkle.py
that currently reads "#read genertaed manifest" to correct the typo to "#read
generated manifest" so the inline comment in the test function (around the
manifest-read logic) accurately reflects the operation.
- Around line 1-5: The import block is unsorted causing CI failure; reorder the
imports in experiments/merkle_verification/test_merkle.py so standard library
imports are alphabetized (e.g., argparse, json, logging, pathlib.Path) followed
by third-party/local imports like openverifiablellm.utils and its
extract_text_from_xml, or simply run the linter auto-fix (ruff --fix) to apply
the correct sort automatically.

In `@experiments/preprocessing_determinism/test_preprocessing.py`:
- Around line 38-41: The call to logging.basicConfig has inconsistent
indentation for its arguments; align the keyword arguments (level=logging.INFO,
format="%(levelname)s - %(message)s") vertically under the function call so
indentation is consistent with surrounding code (e.g., indent the two argument
lines to match the opening parenthesis of logging.basicConfig). Locate the
logging.basicConfig call in test_preprocessing.py and adjust the argument
indentation so the block is consistently formatted.
- Line 21: Fix the typo in the inline comment "#read genertaed manifest" by
changing it to "# read generated manifest" (or similar correct phrasing) so the
comment referencing the manifest read is clear; locate the comment in
experiments/preprocessing_determinism/test_preprocessing.py (look for the exact
string "read genertaed manifest") and update it accordingly.
- Line 10: Update the descriptive comment in
experiments/preprocessing_determinism/test_preprocessing.py by fixing the typo:
replace "compairing" with "comparing" in the sentence "Experiment to test
Deterministic preprocessing, by compairing generated hash on 2 runs." so it
reads "Experiment to test Deterministic preprocessing, by comparing generated
hash on 2 runs." Ensure you edit the same comment or docstring where that phrase
appears.
- Around line 1-5: The import block in test_preprocessing.py is unsorted;
reorder the imports to follow canonical groups and alphabetical order (stdlib
first: import json, import logging, import sys; then third-party/other: from
pathlib import Path; then local: from openverifiablellm.utils import
extract_text_from_xml), or simply run the linter auto-fix (ruff --fix) to apply
the correct sorting; update the import statement order in the module (look for
the top-of-file imports: sys, json, logging, Path, extract_text_from_xml)
accordingly.

In `@experiments/README.md`:
- Line 1: Change the first line heading from a level-2 to a level-1 heading by
replacing "## Experiments" with "# Experiments" so the README starts with a
top-level h1; update the header text at the top of the file (the "##
Experiments" line) to "# Experiments".
- Around line 69-78: Convert the plain text items into a proper Markdown list by
prefixing each entry with a consistent list marker (e.g., "- "): update
"sample_wiki.xml.bz2 (original)", "tampered_sample_wiki.xml.bz2 (modified)", and
the two comparison keys "raw_merkle_root" and "processed_merkle_root" so they
appear as list items under the sentence "The experiment compares:". Ensure the
same marker is used for all four lines to match the document's Markdown style.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 34e326dd-7c10-40cb-a918-21f1ebd91fd8

📥 Commits

Reviewing files that changed from the base of the PR and between a2dfd7a and e7f133a.

📒 Files selected for processing (4)

experiments/README.md
experiments/data_subset/sample_wiki_generate.py
experiments/merkle_verification/test_merkle.py
experiments/preprocessing_determinism/test_preprocessing.py

coderabbitai · 2026-03-11T09:03:45Z

experiments/data_subset/sample_wiki_generate.py

@@ -0,0 +1,20 @@
+import bz2
+
+# To make this tampered I deleted e of online


⚠️ Potential issue | 🟡 Minor

Minor typo in comment.

📝 Proposed fix

-# To make this tampered I deleted e of online +# To make this tampered, I deleted 'e' from "online"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# To make this tampered I deleted e of online

# To make this tampered, I deleted 'e' from "online"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/data_subset/sample_wiki_generate.py` at line 3, Fix the minor typo in the inline comment in sample_wiki_generate.py: replace the current comment "# To make this tampered I deleted e of online" with a clearer phrasing such as "# To make this tampered I deleted the 'e' in 'online'." to correct grammar and clarify intent; update the comment near the tampering logic or the top of the file where this sentence appears.

coderabbitai · 2026-03-11T09:03:45Z

experiments/data_subset/sample_wiki_generate.py

+with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f:
+    f.write(xml_content)


🧹 Nitpick | 🔵 Trivial

Wrap file-writing logic in an if __name__ == "__main__" guard.

The script executes file I/O at module import time. This can cause unintended side effects if the module is imported elsewhere. Adding a main guard is a best practice for executable scripts.

♻️ Proposed fix

-with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f: - f.write(xml_content) +if __name__ == "__main__": + with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f: + f.write(xml_content)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f:

f.write(xml_content)

if __name__ == "__main__":

with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f:

f.write(xml_content)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/data_subset/sample_wiki_generate.py` around lines 19 - 20, The file currently performs file I/O (the with bz2.open(...) block that writes xml_content to "experiments/data_subset/tampered_sample_wiki.xml.bz2") at import time; move that write block into an if __name__ == "__main__": guard so the write only runs when the script is executed directly, ensuring xml_content is still defined or constructed above the guard (or by a helper function you call from the guard) and preventing side effects on import.

experiments/merkle_verification/test_merkle.py

coderabbitai · 2026-03-11T09:03:45Z

experiments/merkle_verification/test_merkle.py

+    """Run preprocessing and return processed Merkle root."""
+    extract_text_from_xml(path1)
+
+    #read genertaed manifest


⚠️ Potential issue | 🟡 Minor

Fix typo: "genertaed" → "generated".

📝 Proposed fix

- `#read` genertaed manifest + # Read generated manifest

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#read genertaed manifest

# Read generated manifest

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/merkle_verification/test_merkle.py` at line 22, Update the comment in experiments/merkle_verification/test_merkle.py that currently reads "#read genertaed manifest" to correct the typo to "#read generated manifest" so the inline comment in the test function (around the manifest-read logic) accurately reflects the operation.

experiments/preprocessing_determinism/test_preprocessing.py

coderabbitai · 2026-03-11T09:03:45Z

experiments/preprocessing_determinism/test_preprocessing.py

+logger = logging.getLogger(__name__)
+
+"""
+Experiment to test Deterministic preprocessing, by compairing generated hash on 2 runs.


⚠️ Potential issue | 🟡 Minor

Fix typo: "compairing" → "comparing".

📝 Proposed fix

-Experiment to test Deterministic preprocessing, by compairing generated hash on 2 runs. +Experiment to test deterministic preprocessing by comparing generated hash on 2 runs.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Experiment to test Deterministic preprocessing, by compairing generated hash on 2 runs.

Experiment to test deterministic preprocessing by comparing generated hash on 2 runs.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/preprocessing_determinism/test_preprocessing.py` at line 10, Update the descriptive comment in experiments/preprocessing_determinism/test_preprocessing.py by fixing the typo: replace "compairing" with "comparing" in the sentence "Experiment to test Deterministic preprocessing, by compairing generated hash on 2 runs." so it reads "Experiment to test Deterministic preprocessing, by comparing generated hash on 2 runs." Ensure you edit the same comment or docstring where that phrase appears.

coderabbitai · 2026-03-11T09:03:45Z

experiments/preprocessing_determinism/test_preprocessing.py

+    # Run preprocessing
+    extract_text_from_xml(input_path)
+
+    #read genertaed manifest


⚠️ Potential issue | 🟡 Minor

Fix typo: "genertaed" → "generated".

📝 Proposed fix

- `#read` genertaed manifest + # Read generated manifest

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

#read genertaed manifest

# Read generated manifest

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/preprocessing_determinism/test_preprocessing.py` at line 21, Fix the typo in the inline comment "#read genertaed manifest" by changing it to "# read generated manifest" (or similar correct phrasing) so the comment referencing the manifest read is clear; locate the comment in experiments/preprocessing_determinism/test_preprocessing.py (look for the exact string "read genertaed manifest") and update it accordingly.

coderabbitai · 2026-03-11T09:03:45Z

experiments/preprocessing_determinism/test_preprocessing.py

+    logging.basicConfig(
+    level=logging.INFO,
+    format="%(levelname)s - %(message)s"
+    )


🧹 Nitpick | 🔵 Trivial

Fix inconsistent indentation.

The arguments to logging.basicConfig have inconsistent indentation relative to the function call.

📝 Proposed fix

logging.basicConfig( - level=logging.INFO, - format="%(levelname)s - %(message)s" + level=logging.INFO, + format="%(levelname)s - %(message)s" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logging.basicConfig(

level=logging.INFO,

format="%(levelname)s - %(message)s"

)

logging.basicConfig(

level=logging.INFO,

format="%(levelname)s - %(message)s"

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/preprocessing_determinism/test_preprocessing.py` around lines 38 - 41, The call to logging.basicConfig has inconsistent indentation for its arguments; align the keyword arguments (level=logging.INFO, format="%(levelname)s - %(message)s") vertically under the function call so indentation is consistent with surrounding code (e.g., indent the two argument lines to match the opening parenthesis of logging.basicConfig). Locate the logging.basicConfig call in test_preprocessing.py and adjust the argument indentation so the block is consistently formatted.

coderabbitai · 2026-03-11T09:03:46Z

experiments/README.md

@@ -0,0 +1,100 @@
+## Experiments


⚠️ Potential issue | 🟡 Minor

Use a top-level heading (h1) as the first line.

The static analysis tool flagged that the first line should be a top-level heading. Change ## Experiments to # Experiments.

📝 Proposed fix

-## Experiments +# Experiments

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

## Experiments

# Experiments

🧰 Tools

🪛 markdownlint-cli2 (0.21.0)

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/README.md` at line 1, Change the first line heading from a level-2 to a level-1 heading by replacing "## Experiments" with "# Experiments" so the README starts with a top-level h1; update the header text at the top of the file (the "## Experiments" line) to "# Experiments".

coderabbitai · 2026-03-11T09:03:46Z

experiments/README.md

+sample_wiki.xml.bz2 (original)
+
+tampered_sample_wiki.xml.bz2 (modified)
+
+The experiment compares:
+
+raw_merkle_root
+
+processed_merkle_root
+


⚠️ Potential issue | 🟡 Minor

Format list items consistently using Markdown syntax.

Lines 69-78 are missing Markdown list formatting. They should use - or * prefixes for consistency with the rest of the document.

📝 Proposed fix

Two datasets are used: -sample_wiki.xml.bz2 (original) +- `sample_wiki.xml.bz2` (original) -tampered_sample_wiki.xml.bz2 (modified) +- `tampered_sample_wiki.xml.bz2` (modified) The experiment compares: -raw_merkle_root +- `raw_merkle_root` -processed_merkle_root +- `processed_merkle_root`

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

sample_wiki.xml.bz2 (original)

tampered_sample_wiki.xml.bz2 (modified)

The experiment compares:

raw_merkle_root

processed_merkle_root

Two datasets are used:

- `sample_wiki.xml.bz2` (original)

- `tampered_sample_wiki.xml.bz2` (modified)

The experiment compares:

- `raw_merkle_root`

- `processed_merkle_root`

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/README.md` around lines 69 - 78, Convert the plain text items into a proper Markdown list by prefixing each entry with a consistent list marker (e.g., "- "): update "sample_wiki.xml.bz2 (original)", "tampered_sample_wiki.xml.bz2 (modified)", and the two comparison keys "raw_merkle_root" and "processed_merkle_root" so they appear as list items under the sentence "The experiment compares:". Ensure the same marker is used for all four lines to match the document's Markdown style.

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@experiments/merkle_verification/test_merkle.py`:
- Around line 34-46: The code has inconsistent spacing around assignment and
keyword-argument equals (e.g., "parser= argparse.ArgumentParser", "args=
parser.parse_args", and the dict-like kwargs in logging.basicConfig). Update
these lines so spacing is consistent: use a single space before and after
assignment operators (parser = argparse.ArgumentParser, args =
parser.parse_args) and standard spacing for keyword arguments in function calls
(description="...", level=logging.INFO, format="..."), touching the symbols
parser, argparse.ArgumentParser, parser.add_argument, args, parser.parse_args,
and logging.basicConfig to locate and fix them.
- Around line 19-30: The test fails because run() calls
extract_text_from_xml(path1) which defaults to write_manifest=False so no
manifest is written; update run in test_merkle.py to call
extract_text_from_xml(path1, write_manifest=True) (or otherwise ensure
extract_text_from_xml writes the manifest) so that the subsequent read of
MANIFEST_PATH and access to manifest["raw_merkle_root"] /
manifest["processed_merkle_root"] operates on a freshly written manifest.

In `@experiments/preprocessing_determinism/test_preprocessing.py`:
- Around line 44-45: The two assignment statements use no spaces around the "=";
update the assignments to follow PEP8 by adding spaces around the operator for
both variables (result1 and result2) when calling run(sys.argv[1]). Locate the
two lines that invoke run and assign to result1 and result2 and insert spaces
around the "=" so the assignments read with standard spacing.
- Around line 19-30: The test calls extract_text_from_xml(input_path) but that
function defaults to write_manifest=False so no manifest is produced; change the
call to pass write_manifest=True (i.e., extract_text_from_xml(input_path,
write_manifest=True)) so MANIFEST_PATH is created/updated before opening it,
ensuring the subsequent json.load uses the manifest just written (keep existing
MANIFEST_PATH usage and existing return keys).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f71a6cfd-5251-4ae5-821d-a3c91d6cc7d3

📥 Commits

Reviewing files that changed from the base of the PR and between e7f133a and 996b5d1.

📒 Files selected for processing (2)

experiments/merkle_verification/test_merkle.py
experiments/preprocessing_determinism/test_preprocessing.py

experiments/merkle_verification/test_merkle.py

coderabbitai · 2026-03-11T09:15:45Z

experiments/merkle_verification/test_merkle.py

+    parser= argparse.ArgumentParser(
+        description= "Test tamper detection using Merkle root"
+    )
+
+    parser.add_argument("--path1",required=True,help="Original dataset")
+    parser.add_argument("--path2",required=True,help="Tampered dataset")
+
+    args= parser.parse_args()
+
+    logging.basicConfig(
+        level= logging.INFO,
+        format="%(levelname)s - %(message)s"
+    )


🧹 Nitpick | 🔵 Trivial

Inconsistent spacing around = operators.

Several lines have inconsistent spacing around = which reduces readability.

✏️ Proposed fix

- parser= argparse.ArgumentParser( - description= "Test tamper detection using Merkle root" + parser = argparse.ArgumentParser( + description="Test tamper detection using Merkle root" ) - parser.add_argument("--path1",required=True,help="Original dataset") - parser.add_argument("--path2",required=True,help="Tampered dataset") + parser.add_argument("--path1", required=True, help="Original dataset") + parser.add_argument("--path2", required=True, help="Tampered dataset") - args= parser.parse_args() + args = parser.parse_args() logging.basicConfig( - level= logging.INFO, + level=logging.INFO, format="%(levelname)s - %(message)s" )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

parser= argparse.ArgumentParser(

description= "Test tamper detection using Merkle root"

)

parser.add_argument("--path1",required=True,help="Original dataset")

parser.add_argument("--path2",required=True,help="Tampered dataset")

args= parser.parse_args()

logging.basicConfig(

level= logging.INFO,

format="%(levelname)s - %(message)s"

)

parser = argparse.ArgumentParser(

description="Test tamper detection using Merkle root"

)

parser.add_argument("--path1", required=True, help="Original dataset")

parser.add_argument("--path2", required=True, help="Tampered dataset")

args = parser.parse_args()

logging.basicConfig(

level=logging.INFO,

format="%(levelname)s - %(message)s"

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/merkle_verification/test_merkle.py` around lines 34 - 46, The code has inconsistent spacing around assignment and keyword-argument equals (e.g., "parser= argparse.ArgumentParser", "args= parser.parse_args", and the dict-like kwargs in logging.basicConfig). Update these lines so spacing is consistent: use a single space before and after assignment operators (parser = argparse.ArgumentParser, args = parser.parse_args) and standard spacing for keyword arguments in function calls (description="...", level=logging.INFO, format="..."), touching the symbols parser, argparse.ArgumentParser, parser.add_argument, args, parser.parse_args, and logging.basicConfig to locate and fix them.

coderabbitai · 2026-03-11T09:15:45Z

experiments/preprocessing_determinism/test_preprocessing.py

+    # Run preprocessing
+    extract_text_from_xml(input_path)
+
+    #read genertaed manifest
+    with MANIFEST_PATH.open() as f:
+        manifest = json.load(f)
+
+    return {
+        "processed_sha256": manifest["processed_sha256"],
+        "processed_merkle_root": manifest["processed_merkle_root"],
+        "environment_hash": manifest["environment_hash"],
+    }


⚠️ Potential issue | 🔴 Critical

Critical: Manifest is never written—experiment will fail or use stale data.

extract_text_from_xml defaults to write_manifest=False (see openverifiablellm/utils.py:159). Since no write_manifest=True is passed, the manifest file is never created/updated, and the subsequent read on line 23-24 will either raise FileNotFoundError or read stale data from a previous unrelated run.

🐛 Proposed fix

def run(input_path): # Run preprocessing - extract_text_from_xml(input_path) + extract_text_from_xml(input_path, write_manifest=True) - `#read` genertaed manifest + # Read generated manifest with MANIFEST_PATH.open() as f: manifest = json.load(f)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Run preprocessing

extract_text_from_xml(input_path)

#read genertaed manifest

with MANIFEST_PATH.open() as f:

manifest = json.load(f)

return {

"processed_sha256": manifest["processed_sha256"],

"processed_merkle_root": manifest["processed_merkle_root"],

"environment_hash": manifest["environment_hash"],

}

# Run preprocessing

extract_text_from_xml(input_path, write_manifest=True)

# Read generated manifest

with MANIFEST_PATH.open() as f:

manifest = json.load(f)

return {

"processed_sha256": manifest["processed_sha256"],

"processed_merkle_root": manifest["processed_merkle_root"],

"environment_hash": manifest["environment_hash"],

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/preprocessing_determinism/test_preprocessing.py` around lines 19 - 30, The test calls extract_text_from_xml(input_path) but that function defaults to write_manifest=False so no manifest is produced; change the call to pass write_manifest=True (i.e., extract_text_from_xml(input_path, write_manifest=True)) so MANIFEST_PATH is created/updated before opening it, ensuring the subsequent json.load uses the manifest just written (keep existing MANIFEST_PATH usage and existing return keys).

coderabbitai · 2026-03-11T09:15:45Z

experiments/preprocessing_determinism/test_preprocessing.py

+    result1= run(sys.argv[1])
+    result2= run(sys.argv[1])


🧹 Nitpick | 🔵 Trivial

Add spaces around assignment operators for consistency.

Minor style: PEP 8 recommends spaces around = in assignments.

✏️ Proposed fix

- result1= run(sys.argv[1]) - result2= run(sys.argv[1]) + result1 = run(sys.argv[1]) + result2 = run(sys.argv[1])

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

result1= run(sys.argv[1])

result2= run(sys.argv[1])

result1 = run(sys.argv[1])

result2 = run(sys.argv[1])

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@experiments/preprocessing_determinism/test_preprocessing.py` around lines 44 - 45, The two assignment statements use no spaces around the "="; update the assignments to follow PEP8 by adding spaces around the operator for both variables (result1 and result2) when calling run(sys.argv[1]). Locate the two lines that invoke run and assign to result1 and result2 and insert spaces around the "=" so the assignments read with standard spacing.

add experiments directory

e7f133a

github-actions bot added good first issue Good for newcomers backend documentation Improvements or additions to documentation python size/L external-contributor pending-coderabbit-review and removed size/L labels Mar 11, 2026

github-actions bot added size/L and removed size/L labels Mar 11, 2026

fix: organize imports in experiments scripts

0dcf633

coderabbitai bot requested changes Mar 11, 2026

View reviewed changes

github-actions bot added size/L and removed size/L labels Mar 11, 2026

Merge branch 'AOSSIE-Org:main' into experiments

996b5d1

github-actions bot added size/L and removed size/L labels Mar 11, 2026

coderabbitai bot requested changes Mar 11, 2026

View reviewed changes

		@@ -0,0 +1,20 @@
		import bz2

		# To make this tampered I deleted e of online

	# To make this tampered I deleted e of online
	# To make this tampered, I deleted 'e' from "online"

		with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f:
		f.write(xml_content) No newline at end of file

	Experiment to test Deterministic preprocessing, by compairing generated hash on 2 runs.
	Experiment to test deterministic preprocessing by comparing generated hash on 2 runs.

Uh oh!

Conversation

tani-dubey commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addressed Issues:

Additional Notes:

Checklist

⚠️ AI Notice - Important!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 11, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tani-dubey commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading