Expand test coverage across nodes, lowfat, and TSV#141
Open
jonathanrobie wants to merge 1 commit into
Open
Conversation
New tests catch bugs fixed by PRs #136, #137, #140 and enforce constraints going forward. test_nodes.py: - test_file_is_nfc: all text must be Unicode NFC (enforces PR #136) - test_no_cgj_anywhere: no CGJ (U+034F) in any file (enforces PR #136) - test_m_xml_id_format: xml:id must be o<digits> (with optional ה) - test_m_lemma_not_empty: every <m> has non-empty @lemma - test_m_morph_not_empty: every <m> has non-empty @morph - test_m_after_valid_values: @after must be from the known valid set - test_non_final_word_last_m_has_after: non-final words have @after (enforces PR #137) test_lowfat.py: - test_file_is_nfc: all text must be Unicode NFC - test_no_cgj_anywhere: no CGJ in any file - test_w_lemma_not_empty: every <w> has non-empty @lemma - test_w_after_not_missing: every <w> has an @after attribute test_tsv.py: - test_tsv_is_nfc: all text must be Unicode NFC - test_tsv_no_cgj: no CGJ in the TSV - test_tsv_xml_id_format: xml:id matches expected format - test_tsv_ref_format: ref matches USFM pattern - test_tsv_after_valid_values: after column from known valid set - test_tsv_lemma_not_empty: every row has non-empty lemma Note: CGJ, NFC, and lowfat @after tests will fail until PRs #136, #137 are merged and lowfat/TSV are regenerated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds tests that catch the bugs fixed by PRs #136, #137, and #140, and enforce data quality constraints going forward.
New tests
test_nodes.py
test_file_is_nfc— all text must be Unicode NFCtest_no_cgj_anywhere— no CGJ (U+034F) anywhere in filetest_m_xml_id_format—xml:idmust beo<digits>(with optionalהsuffix for subsumed definite articles)test_m_lemma_not_empty— every<m>has non-empty@lemmatest_m_morph_not_empty— every<m>has non-empty@morphtest_m_after_valid_values—@aftermust be from the known valid settest_non_final_word_last_m_has_after— last morpheme of each non-final orthographic word has non-empty@aftertest_lowfat.py
test_file_is_nfc— all text must be Unicode NFCtest_no_cgj_anywhere— no CGJ anywhere in filetest_w_lemma_not_empty— every<w>has non-empty@lemmatest_w_after_not_missing— every<w>has an@afterattributetest_tsv.py
test_tsv_is_nfc— all text must be Unicode NFCtest_tsv_no_cgj— no CGJ in the TSVtest_tsv_xml_id_format—xml:idmatches expected formattest_tsv_ref_format—refmatches USFM patterntest_tsv_after_valid_values—aftercolumn from known valid settest_tsv_lemma_not_empty— every row has non-emptylemmaExpected failures until other PRs merge
test_file_is_nfc,test_no_cgj_anywhere(nodes)test_file_is_nfc,test_no_cgj_anywhere,test_w_after_not_missing(lowfat)test_tsv_is_nfc,test_tsv_no_cgjtest_non_final_word_last_m_has_after(nodes)🤖 Generated with Claude Code