Skip to content

Expand test coverage across nodes, lowfat, and TSV#141

Open
jonathanrobie wants to merge 1 commit into
mainfrom
add/expanded-test-coverage
Open

Expand test coverage across nodes, lowfat, and TSV#141
jonathanrobie wants to merge 1 commit into
mainfrom
add/expanded-test-coverage

Conversation

@jonathanrobie
Copy link
Copy Markdown
Contributor

Summary

Adds tests that catch the bugs fixed by PRs #136, #137, and #140, and enforce data quality constraints going forward.

New tests

test_nodes.py

  • test_file_is_nfc — all text must be Unicode NFC
  • test_no_cgj_anywhere — no CGJ (U+034F) anywhere in file
  • test_m_xml_id_formatxml:id must be o<digits> (with optional ה suffix for subsumed definite articles)
  • test_m_lemma_not_empty — every <m> has non-empty @lemma
  • test_m_morph_not_empty — every <m> has non-empty @morph
  • test_m_after_valid_values@after must be from the known valid set
  • test_non_final_word_last_m_has_after — last morpheme of each non-final orthographic word has non-empty @after

test_lowfat.py

  • test_file_is_nfc — all text must be Unicode NFC
  • test_no_cgj_anywhere — no CGJ anywhere in file
  • test_w_lemma_not_empty — every <w> has non-empty @lemma
  • test_w_after_not_missing — every <w> has an @after attribute

test_tsv.py

  • test_tsv_is_nfc — all text must be Unicode NFC
  • test_tsv_no_cgj — no CGJ in the TSV
  • test_tsv_xml_id_formatxml:id matches expected format
  • test_tsv_ref_formatref matches USFM pattern
  • test_tsv_after_valid_valuesafter column from known valid set
  • test_tsv_lemma_not_empty — every row has non-empty lemma

Expected failures until other PRs merge

Test Passes after
test_file_is_nfc, test_no_cgj_anywhere (nodes) PR #136 merges
test_file_is_nfc, test_no_cgj_anywhere, test_w_after_not_missing (lowfat) PRs #136 + #137 merge + lowfat regenerated
test_tsv_is_nfc, test_tsv_no_cgj PRs #136 + #137 merge + TSV regenerated
test_non_final_word_last_m_has_after (nodes) PR #137 merges

🤖 Generated with Claude Code

New tests catch bugs fixed by PRs #136, #137, #140 and enforce
constraints going forward.

test_nodes.py:
- test_file_is_nfc: all text must be Unicode NFC (enforces PR #136)
- test_no_cgj_anywhere: no CGJ (U+034F) in any file (enforces PR #136)
- test_m_xml_id_format: xml:id must be o<digits> (with optional ה)
- test_m_lemma_not_empty: every <m> has non-empty @lemma
- test_m_morph_not_empty: every <m> has non-empty @morph
- test_m_after_valid_values: @after must be from the known valid set
- test_non_final_word_last_m_has_after: non-final words have @after
  (enforces PR #137)

test_lowfat.py:
- test_file_is_nfc: all text must be Unicode NFC
- test_no_cgj_anywhere: no CGJ in any file
- test_w_lemma_not_empty: every <w> has non-empty @lemma
- test_w_after_not_missing: every <w> has an @after attribute

test_tsv.py:
- test_tsv_is_nfc: all text must be Unicode NFC
- test_tsv_no_cgj: no CGJ in the TSV
- test_tsv_xml_id_format: xml:id matches expected format
- test_tsv_ref_format: ref matches USFM pattern
- test_tsv_after_valid_values: after column from known valid set
- test_tsv_lemma_not_empty: every row has non-empty lemma

Note: CGJ, NFC, and lowfat @after tests will fail until PRs #136,
#137 are merged and lowfat/TSV are regenerated.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant