Add malformed XML test and fix verify.py regression#49
Add malformed XML test and fix verify.py regression#49Shubhamx404 wants to merge 4 commits intoAOSSIE-Org:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughMake Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs). Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openverifiablellm/verify.py`:
- Around line 413-419: The manifest check for "chunk_size_bytes" should mirror
the behavior used for other optional fields: if "chunk_size_bytes" is present
call _check_field as currently done; otherwise add a CheckStatus.SKIP entry to
the report (use the same report key "manifest_chunk_size_bytes" and a brief
detail like "Merkle chunk size not present in manifest") so the absence is
explicitly recorded; update the code around the _check_field call that
references manifest["chunk_size_bytes"],
reproduced_manifest.get("chunk_size_bytes"), and the report to include this SKIP
branch.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 291635d0-780c-4aa3-a6ed-a177b1b9e64b
📒 Files selected for processing (2)
openverifiablellm/verify.pytests/test_util.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@openverifiablellm/verify.py`:
- Around line 420-425: The new SKIP branch for manifest_chunk_size_bytes is
untested; update the existing test that checks
raw_merkle_root/processed_merkle_root to also assert that report contains a
CheckResult with name "manifest_chunk_size_bytes", status CheckStatus.SKIP, and
detail "Field absent from manifest (older version)". Locate where the test
inspects the report (the test calling verify and examining report.results) and
add an assertion that at least one entry matches CheckResult.name ==
"manifest_chunk_size_bytes" and CheckResult.status == CheckStatus.SKIP and
CheckResult.detail contains "Field absent from manifest (older version)" so
regressions on this legacy path are caught.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4eab8d5c-5c9b-4641-82c9-24be9e520269
📒 Files selected for processing (1)
openverifiablellm/verify.py
|
hey @Archit381 can you please review this pr |
Addressed Issues:
Fixes #48
Summary
Added a missing edge case test for the preprocessing pipeline to ensure that
extract_text_from_xmlfails fast when it encounters malformed XML. In the process, a regression in the verification test suite (openverifiablellm/verify.py) was also fixed to allow all pipeline checks to pass successfully.Problem
Currently, there are no tests enforcing the failure state for corrupted XML inputs. If a anyone accidentally suppresses the
ET.ParseErrorduring a refactor, it will silently pass also , feeding corrupted data into theverify.pypipeline causes a native Python crash, rather than gracefully reporting a pipeline failure.he pipeline would silently ignore corrupted files and produce incomplete outputs
Problem in
verify.pywhen
verify.pywas used against manifest or when the pipeline hit corrupted data, the script crashed withUnboundLocalErrorinstead of produce Verification report.Solution
Created
test_extract_text_from_xml_malformed_xmlthis genrate a broken XML file and feeds it to the pipeline while strictly asserting pytest.raises(ET.ParseError).Fixed verification crash for file
verify.pyScreenshots/Recordings:
dataset_manifest.jsonin the data/ folder. The Expected values are read from this manifest, while the Actual values are computed in real time from the file currently passed to the script (e.g., invalid_wikipedia_dump.xml)Checklist
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact.
Summary by CodeRabbit
Bug Fixes
Tests