Skip to content

Add experiments for preprocessing determinism and Merkle verification#66

Open
tani-dubey wants to merge 3 commits intoAOSSIE-Org:mainfrom
tani-dubey:experiments
Open

Add experiments for preprocessing determinism and Merkle verification#66
tani-dubey wants to merge 3 commits intoAOSSIE-Org:mainfrom
tani-dubey:experiments

Conversation

@tani-dubey
Copy link
Contributor

@tani-dubey tani-dubey commented Mar 11, 2026

Addressed Issues:

Fixes #37

This PR adds experiments/ directory containing small reproducible experiments to validate assumptions about the OpenVerifiableLLM pipeline.

It has experiments :

  • preprocessing determinism experiment
    Verifies that running the preprocessing pipeline multiple times on the same dataset produces identical artifacts (processed_sha256, processed_merkle_root, environment_hash).

  • Merkle root tamper detection experiment
    Demonstrates how dataset modifications can be detected using Merkle roots.

  • small dataset subset for testing
    Provides a lightweight Wikipedia XML example to allow experiments to run quickly without requiring large dumps.

  • experiments README with usage instructions

Additional Notes:

While implementing the tamper detection experiment, I observed that the processed merkle_root alone is unable to show tamper for small change ( like removing or adding a single alphabet from sample.bz2 goes undetected).

To address this, the experiment compares both:

  • raw_merkle_root
  • processed_merkle_root

Checklist

  • My code follows the project's code style and conventions
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings or errors
  • I have joined the Discord server and I will share a link to this PR with the project maintainers there
  • I have read the Contributing Guidelines

⚠️ AI Notice - Important!

We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.

Summary by CodeRabbit

  • Documentation

    • Added a comprehensive experiments guide with setup, quick-start commands, directory layout, and expected output formats.
  • New Features

    • Added experiment scripts for deterministic preprocessing checks, tamper-detection via cryptographic root comparison, and a generated tampered dataset sample for testing.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 11, 2026

Walkthrough

Adds an experiments/ directory with README and three CLI experiment scripts to validate preprocessing determinism, Merkle root tamper detection, and a tampered Wikipedia sample generator; scripts read/produce dataset manifests and compare hashes/roots for verification.

Changes

Cohort / File(s) Summary
Documentation
experiments/README.md
New README documenting experiments layout, three experiments (Preprocessing Determinism, Merkle Root Tamper Detection, Dataset Subset), run commands, expected manifest fields (processed_sha256, processed_merkle_root, raw_merkle_root, environment_hash).
Determinism Test
experiments/preprocessing_determinism/test_preprocessing.py
New CLI script that runs preprocessing twice on an XML input, reads data/dataset_manifest.json, and compares processed_sha256, processed_merkle_root, and environment_hash for determinism.
Merkle Verification
experiments/merkle_verification/test_merkle.py
New CLI script that preprocesses inputs (via extract_text_from_xml), loads manifest(s), prints raw_merkle_root and processed_merkle_root for two runs, and reports tampering if roots differ.
Tampered Sample Generator
experiments/data_subset/sample_wiki_generate.py
New script that writes a bz2-compressed small MediaWiki XML sample with intentional tampering for dataset subset experiments.

Sequence Diagram(s)

sequenceDiagram
  participant User as User (CLI)
  participant Preproc as Preprocessor (extract_text_from_xml)
  participant FS as Filesystem / Manifest (data/dataset_manifest.json)
  participant Comparator as Comparator

  User->>Preproc: provide dataset path
  Preproc->>FS: write/read processed artifacts + manifest
  FS-->>Preproc: return manifest with raw_merkle_root, processed_merkle_root
  Preproc->>User: return manifest fields
  User->>Comparator: provide two manifest results
  Comparator->>Comparator: compare raw_merkle_root and processed_merkle_root
  Comparator-->>User: report "tampering detected" or "identical"
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

Python Lang, Documentation

Suggested reviewers

  • Archit381

Poem

🐰 I hopped through manifests and roots so fine,
I nibbled a wiki, removed one little line.
I checked the hashes, compared them twice,
Found tamper tracks and said, "How nice!"
Hooray for experiments — crunchy, cryptic, divine.

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main changes: adding experiments for preprocessing determinism and Merkle verification, which are central to this PR.
Linked Issues check ✅ Passed The PR implements core objectives from issue #37: creates an experiments/ directory with three functional experiments (preprocessing determinism, Merkle tamper detection, dataset subset) plus README, validating determinism and verification assumptions as requested.
Out of Scope Changes check ✅ Passed All changes are scoped to the experiments/ directory and directly support the three proposed experiment categories from issue #37; no extraneous modifications to main codebase detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added size/L and removed size/L labels Mar 11, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@experiments/data_subset/sample_wiki_generate.py`:
- Line 3: Fix the minor typo in the inline comment in sample_wiki_generate.py:
replace the current comment "# To make this tampered I deleted e of online" with
a clearer phrasing such as "# To make this tampered I deleted the 'e' in
'online'." to correct grammar and clarify intent; update the comment near the
tampering logic or the top of the file where this sentence appears.
- Around line 19-20: The file currently performs file I/O (the with
bz2.open(...) block that writes xml_content to
"experiments/data_subset/tampered_sample_wiki.xml.bz2") at import time; move
that write block into an if __name__ == "__main__": guard so the write only runs
when the script is executed directly, ensuring xml_content is still defined or
constructed above the guard (or by a helper function you call from the guard)
and preventing side effects on import.

In `@experiments/merkle_verification/test_merkle.py`:
- Line 22: Update the comment in experiments/merkle_verification/test_merkle.py
that currently reads "#read genertaed manifest" to correct the typo to "#read
generated manifest" so the inline comment in the test function (around the
manifest-read logic) accurately reflects the operation.
- Around line 1-5: The import block is unsorted causing CI failure; reorder the
imports in experiments/merkle_verification/test_merkle.py so standard library
imports are alphabetized (e.g., argparse, json, logging, pathlib.Path) followed
by third-party/local imports like openverifiablellm.utils and its
extract_text_from_xml, or simply run the linter auto-fix (ruff --fix) to apply
the correct sort automatically.

In `@experiments/preprocessing_determinism/test_preprocessing.py`:
- Around line 38-41: The call to logging.basicConfig has inconsistent
indentation for its arguments; align the keyword arguments (level=logging.INFO,
format="%(levelname)s - %(message)s") vertically under the function call so
indentation is consistent with surrounding code (e.g., indent the two argument
lines to match the opening parenthesis of logging.basicConfig). Locate the
logging.basicConfig call in test_preprocessing.py and adjust the argument
indentation so the block is consistently formatted.
- Line 21: Fix the typo in the inline comment "#read genertaed manifest" by
changing it to "# read generated manifest" (or similar correct phrasing) so the
comment referencing the manifest read is clear; locate the comment in
experiments/preprocessing_determinism/test_preprocessing.py (look for the exact
string "read genertaed manifest") and update it accordingly.
- Line 10: Update the descriptive comment in
experiments/preprocessing_determinism/test_preprocessing.py by fixing the typo:
replace "compairing" with "comparing" in the sentence "Experiment to test
Deterministic preprocessing, by compairing generated hash on 2 runs." so it
reads "Experiment to test Deterministic preprocessing, by comparing generated
hash on 2 runs." Ensure you edit the same comment or docstring where that phrase
appears.
- Around line 1-5: The import block in test_preprocessing.py is unsorted;
reorder the imports to follow canonical groups and alphabetical order (stdlib
first: import json, import logging, import sys; then third-party/other: from
pathlib import Path; then local: from openverifiablellm.utils import
extract_text_from_xml), or simply run the linter auto-fix (ruff --fix) to apply
the correct sorting; update the import statement order in the module (look for
the top-of-file imports: sys, json, logging, Path, extract_text_from_xml)
accordingly.

In `@experiments/README.md`:
- Line 1: Change the first line heading from a level-2 to a level-1 heading by
replacing "## Experiments" with "# Experiments" so the README starts with a
top-level h1; update the header text at the top of the file (the "##
Experiments" line) to "# Experiments".
- Around line 69-78: Convert the plain text items into a proper Markdown list by
prefixing each entry with a consistent list marker (e.g., "- "): update
"sample_wiki.xml.bz2 (original)", "tampered_sample_wiki.xml.bz2 (modified)", and
the two comparison keys "raw_merkle_root" and "processed_merkle_root" so they
appear as list items under the sentence "The experiment compares:". Ensure the
same marker is used for all four lines to match the document's Markdown style.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 34e326dd-7c10-40cb-a918-21f1ebd91fd8

📥 Commits

Reviewing files that changed from the base of the PR and between a2dfd7a and e7f133a.

📒 Files selected for processing (4)
  • experiments/README.md
  • experiments/data_subset/sample_wiki_generate.py
  • experiments/merkle_verification/test_merkle.py
  • experiments/preprocessing_determinism/test_preprocessing.py

@@ -0,0 +1,20 @@
import bz2

# To make this tampered I deleted e of online
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Minor typo in comment.

📝 Proposed fix
-# To make this tampered I deleted e of online
+# To make this tampered, I deleted 'e' from "online"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# To make this tampered I deleted e of online
# To make this tampered, I deleted 'e' from "online"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/data_subset/sample_wiki_generate.py` at line 3, Fix the minor
typo in the inline comment in sample_wiki_generate.py: replace the current
comment "# To make this tampered I deleted e of online" with a clearer phrasing
such as "# To make this tampered I deleted the 'e' in 'online'." to correct
grammar and clarify intent; update the comment near the tampering logic or the
top of the file where this sentence appears.

Comment on lines +19 to +20
with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f:
f.write(xml_content) No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Wrap file-writing logic in an if __name__ == "__main__" guard.

The script executes file I/O at module import time. This can cause unintended side effects if the module is imported elsewhere. Adding a main guard is a best practice for executable scripts.

♻️ Proposed fix
-with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f:
-    f.write(xml_content)
+if __name__ == "__main__":
+    with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f:
+        f.write(xml_content)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f:
f.write(xml_content)
if __name__ == "__main__":
with bz2.open("experiments/data_subset/tampered_sample_wiki.xml.bz2", "wt", encoding="utf-8") as f:
f.write(xml_content)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/data_subset/sample_wiki_generate.py` around lines 19 - 20, The
file currently performs file I/O (the with bz2.open(...) block that writes
xml_content to "experiments/data_subset/tampered_sample_wiki.xml.bz2") at import
time; move that write block into an if __name__ == "__main__": guard so the
write only runs when the script is executed directly, ensuring xml_content is
still defined or constructed above the guard (or by a helper function you call
from the guard) and preventing side effects on import.

"""Run preprocessing and return processed Merkle root."""
extract_text_from_xml(path1)

#read genertaed manifest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo: "genertaed" → "generated".

📝 Proposed fix
-    `#read` genertaed manifest
+    # Read generated manifest
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#read genertaed manifest
# Read generated manifest
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/merkle_verification/test_merkle.py` at line 22, Update the
comment in experiments/merkle_verification/test_merkle.py that currently reads
"#read genertaed manifest" to correct the typo to "#read generated manifest" so
the inline comment in the test function (around the manifest-read logic)
accurately reflects the operation.

logger = logging.getLogger(__name__)

"""
Experiment to test Deterministic preprocessing, by compairing generated hash on 2 runs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo: "compairing" → "comparing".

📝 Proposed fix
-Experiment to test Deterministic preprocessing, by compairing generated hash on 2 runs.
+Experiment to test deterministic preprocessing by comparing generated hash on 2 runs.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Experiment to test Deterministic preprocessing, by compairing generated hash on 2 runs.
Experiment to test deterministic preprocessing by comparing generated hash on 2 runs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/preprocessing_determinism/test_preprocessing.py` at line 10,
Update the descriptive comment in
experiments/preprocessing_determinism/test_preprocessing.py by fixing the typo:
replace "compairing" with "comparing" in the sentence "Experiment to test
Deterministic preprocessing, by compairing generated hash on 2 runs." so it
reads "Experiment to test Deterministic preprocessing, by comparing generated
hash on 2 runs." Ensure you edit the same comment or docstring where that phrase
appears.

# Run preprocessing
extract_text_from_xml(input_path)

#read genertaed manifest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo: "genertaed" → "generated".

📝 Proposed fix
-    `#read` genertaed manifest
+    # Read generated manifest
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#read genertaed manifest
# Read generated manifest
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/preprocessing_determinism/test_preprocessing.py` at line 21, Fix
the typo in the inline comment "#read genertaed manifest" by changing it to "#
read generated manifest" (or similar correct phrasing) so the comment
referencing the manifest read is clear; locate the comment in
experiments/preprocessing_determinism/test_preprocessing.py (look for the exact
string "read genertaed manifest") and update it accordingly.

Comment on lines +38 to +41
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s - %(message)s"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Fix inconsistent indentation.

The arguments to logging.basicConfig have inconsistent indentation relative to the function call.

📝 Proposed fix
     logging.basicConfig(
-    level=logging.INFO,
-    format="%(levelname)s - %(message)s"
+        level=logging.INFO,
+        format="%(levelname)s - %(message)s"
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s - %(message)s"
)
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s - %(message)s"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/preprocessing_determinism/test_preprocessing.py` around lines 38
- 41, The call to logging.basicConfig has inconsistent indentation for its
arguments; align the keyword arguments (level=logging.INFO,
format="%(levelname)s - %(message)s") vertically under the function call so
indentation is consistent with surrounding code (e.g., indent the two argument
lines to match the opening parenthesis of logging.basicConfig). Locate the
logging.basicConfig call in test_preprocessing.py and adjust the argument
indentation so the block is consistently formatted.

@@ -0,0 +1,100 @@
## Experiments
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use a top-level heading (h1) as the first line.

The static analysis tool flagged that the first line should be a top-level heading. Change ## Experiments to # Experiments.

📝 Proposed fix
-## Experiments
+# Experiments
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Experiments
# Experiments
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 1-1: First line in a file should be a top-level heading

(MD041, first-line-heading, first-line-h1)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/README.md` at line 1, Change the first line heading from a
level-2 to a level-1 heading by replacing "## Experiments" with "# Experiments"
so the README starts with a top-level h1; update the header text at the top of
the file (the "## Experiments" line) to "# Experiments".

Comment on lines +69 to +78
sample_wiki.xml.bz2 (original)

tampered_sample_wiki.xml.bz2 (modified)

The experiment compares:

raw_merkle_root

processed_merkle_root

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Format list items consistently using Markdown syntax.

Lines 69-78 are missing Markdown list formatting. They should use - or * prefixes for consistency with the rest of the document.

📝 Proposed fix
 Two datasets are used:
 
-sample_wiki.xml.bz2 (original)
+- `sample_wiki.xml.bz2` (original)
 
-tampered_sample_wiki.xml.bz2 (modified)
+- `tampered_sample_wiki.xml.bz2` (modified)
 
 The experiment compares:
 
-raw_merkle_root
+- `raw_merkle_root`
 
-processed_merkle_root
+- `processed_merkle_root`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
sample_wiki.xml.bz2 (original)
tampered_sample_wiki.xml.bz2 (modified)
The experiment compares:
raw_merkle_root
processed_merkle_root
Two datasets are used:
- `sample_wiki.xml.bz2` (original)
- `tampered_sample_wiki.xml.bz2` (modified)
The experiment compares:
- `raw_merkle_root`
- `processed_merkle_root`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/README.md` around lines 69 - 78, Convert the plain text items
into a proper Markdown list by prefixing each entry with a consistent list
marker (e.g., "- "): update "sample_wiki.xml.bz2 (original)",
"tampered_sample_wiki.xml.bz2 (modified)", and the two comparison keys
"raw_merkle_root" and "processed_merkle_root" so they appear as list items under
the sentence "The experiment compares:". Ensure the same marker is used for all
four lines to match the document's Markdown style.

@github-actions github-actions bot added size/L and removed size/L labels Mar 11, 2026
@github-actions github-actions bot added size/L and removed size/L labels Mar 11, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@experiments/merkle_verification/test_merkle.py`:
- Around line 34-46: The code has inconsistent spacing around assignment and
keyword-argument equals (e.g., "parser= argparse.ArgumentParser", "args=
parser.parse_args", and the dict-like kwargs in logging.basicConfig). Update
these lines so spacing is consistent: use a single space before and after
assignment operators (parser = argparse.ArgumentParser, args =
parser.parse_args) and standard spacing for keyword arguments in function calls
(description="...", level=logging.INFO, format="..."), touching the symbols
parser, argparse.ArgumentParser, parser.add_argument, args, parser.parse_args,
and logging.basicConfig to locate and fix them.
- Around line 19-30: The test fails because run() calls
extract_text_from_xml(path1) which defaults to write_manifest=False so no
manifest is written; update run in test_merkle.py to call
extract_text_from_xml(path1, write_manifest=True) (or otherwise ensure
extract_text_from_xml writes the manifest) so that the subsequent read of
MANIFEST_PATH and access to manifest["raw_merkle_root"] /
manifest["processed_merkle_root"] operates on a freshly written manifest.

In `@experiments/preprocessing_determinism/test_preprocessing.py`:
- Around line 44-45: The two assignment statements use no spaces around the "=";
update the assignments to follow PEP8 by adding spaces around the operator for
both variables (result1 and result2) when calling run(sys.argv[1]). Locate the
two lines that invoke run and assign to result1 and result2 and insert spaces
around the "=" so the assignments read with standard spacing.
- Around line 19-30: The test calls extract_text_from_xml(input_path) but that
function defaults to write_manifest=False so no manifest is produced; change the
call to pass write_manifest=True (i.e., extract_text_from_xml(input_path,
write_manifest=True)) so MANIFEST_PATH is created/updated before opening it,
ensuring the subsequent json.load uses the manifest just written (keep existing
MANIFEST_PATH usage and existing return keys).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f71a6cfd-5251-4ae5-821d-a3c91d6cc7d3

📥 Commits

Reviewing files that changed from the base of the PR and between e7f133a and 996b5d1.

📒 Files selected for processing (2)
  • experiments/merkle_verification/test_merkle.py
  • experiments/preprocessing_determinism/test_preprocessing.py

Comment on lines +34 to +46
parser= argparse.ArgumentParser(
description= "Test tamper detection using Merkle root"
)

parser.add_argument("--path1",required=True,help="Original dataset")
parser.add_argument("--path2",required=True,help="Tampered dataset")

args= parser.parse_args()

logging.basicConfig(
level= logging.INFO,
format="%(levelname)s - %(message)s"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Inconsistent spacing around = operators.

Several lines have inconsistent spacing around = which reduces readability.

✏️ Proposed fix
-    parser= argparse.ArgumentParser(
-        description= "Test tamper detection using Merkle root"
+    parser = argparse.ArgumentParser(
+        description="Test tamper detection using Merkle root"
     )
     
-    parser.add_argument("--path1",required=True,help="Original dataset")
-    parser.add_argument("--path2",required=True,help="Tampered dataset")
+    parser.add_argument("--path1", required=True, help="Original dataset")
+    parser.add_argument("--path2", required=True, help="Tampered dataset")
     
-    args= parser.parse_args()
+    args = parser.parse_args()
     
     logging.basicConfig(
-        level= logging.INFO,
+        level=logging.INFO,
         format="%(levelname)s - %(message)s"
     )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
parser= argparse.ArgumentParser(
description= "Test tamper detection using Merkle root"
)
parser.add_argument("--path1",required=True,help="Original dataset")
parser.add_argument("--path2",required=True,help="Tampered dataset")
args= parser.parse_args()
logging.basicConfig(
level= logging.INFO,
format="%(levelname)s - %(message)s"
)
parser = argparse.ArgumentParser(
description="Test tamper detection using Merkle root"
)
parser.add_argument("--path1", required=True, help="Original dataset")
parser.add_argument("--path2", required=True, help="Tampered dataset")
args = parser.parse_args()
logging.basicConfig(
level=logging.INFO,
format="%(levelname)s - %(message)s"
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/merkle_verification/test_merkle.py` around lines 34 - 46, The
code has inconsistent spacing around assignment and keyword-argument equals
(e.g., "parser= argparse.ArgumentParser", "args= parser.parse_args", and the
dict-like kwargs in logging.basicConfig). Update these lines so spacing is
consistent: use a single space before and after assignment operators (parser =
argparse.ArgumentParser, args = parser.parse_args) and standard spacing for
keyword arguments in function calls (description="...", level=logging.INFO,
format="..."), touching the symbols parser, argparse.ArgumentParser,
parser.add_argument, args, parser.parse_args, and logging.basicConfig to locate
and fix them.

Comment on lines +19 to +30
# Run preprocessing
extract_text_from_xml(input_path)

#read genertaed manifest
with MANIFEST_PATH.open() as f:
manifest = json.load(f)

return {
"processed_sha256": manifest["processed_sha256"],
"processed_merkle_root": manifest["processed_merkle_root"],
"environment_hash": manifest["environment_hash"],
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Manifest is never written—experiment will fail or use stale data.

extract_text_from_xml defaults to write_manifest=False (see openverifiablellm/utils.py:159). Since no write_manifest=True is passed, the manifest file is never created/updated, and the subsequent read on line 23-24 will either raise FileNotFoundError or read stale data from a previous unrelated run.

🐛 Proposed fix
 def run(input_path):
     # Run preprocessing
-    extract_text_from_xml(input_path)
+    extract_text_from_xml(input_path, write_manifest=True)
     
-    `#read` genertaed manifest
+    # Read generated manifest
     with MANIFEST_PATH.open() as f:
         manifest = json.load(f)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Run preprocessing
extract_text_from_xml(input_path)
#read genertaed manifest
with MANIFEST_PATH.open() as f:
manifest = json.load(f)
return {
"processed_sha256": manifest["processed_sha256"],
"processed_merkle_root": manifest["processed_merkle_root"],
"environment_hash": manifest["environment_hash"],
}
# Run preprocessing
extract_text_from_xml(input_path, write_manifest=True)
# Read generated manifest
with MANIFEST_PATH.open() as f:
manifest = json.load(f)
return {
"processed_sha256": manifest["processed_sha256"],
"processed_merkle_root": manifest["processed_merkle_root"],
"environment_hash": manifest["environment_hash"],
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/preprocessing_determinism/test_preprocessing.py` around lines 19
- 30, The test calls extract_text_from_xml(input_path) but that function
defaults to write_manifest=False so no manifest is produced; change the call to
pass write_manifest=True (i.e., extract_text_from_xml(input_path,
write_manifest=True)) so MANIFEST_PATH is created/updated before opening it,
ensuring the subsequent json.load uses the manifest just written (keep existing
MANIFEST_PATH usage and existing return keys).

Comment on lines +44 to +45
result1= run(sys.argv[1])
result2= run(sys.argv[1])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Add spaces around assignment operators for consistency.

Minor style: PEP 8 recommends spaces around = in assignments.

✏️ Proposed fix
-    result1= run(sys.argv[1])
-    result2= run(sys.argv[1])
+    result1 = run(sys.argv[1])
+    result2 = run(sys.argv[1])
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
result1= run(sys.argv[1])
result2= run(sys.argv[1])
result1 = run(sys.argv[1])
result2 = run(sys.argv[1])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@experiments/preprocessing_determinism/test_preprocessing.py` around lines 44
- 45, The two assignment statements use no spaces around the "="; update the
assignments to follow PEP8 by adding spaces around the operator for both
variables (result1 and result2) when calling run(sys.argv[1]). Locate the two
lines that invoke run and assign to result1 and result2 and insert spaces around
the "=" so the assignments read with standard spacing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[GOOD FIRST ISSUE]: Add experiments/ folder to validate determinism and verification assumptions

1 participant