Skip to content

Release 0.2.8: respect non-greedy in rstr, fixes kubernetes_secret_yaml degeneracy#16

Merged
slima4 merged 1 commit into
mainfrom
release/0.2.8
Apr 21, 2026
Merged

Release 0.2.8: respect non-greedy in rstr, fixes kubernetes_secret_yaml degeneracy#16
slima4 merged 1 commit into
mainfrom
release/0.2.8

Conversation

@slima4
Copy link
Copy Markdown
Member

@slima4 slima4 commented Apr 21, 2026

Summary

  • Real fix for the last remaining 0.2.7 known limitation (kubernetes_secret_yaml degenerate samples).
  • Patches rstr.xeger.Xeger._handle_state so min_repeat emits exactly start_range repetitions — the semantically correct minimum for non-greedy quantifiers, matching how the re engine actually fills those regions against real text.
  • Measured effect: kubernetes_secret_yaml goes from 27 samples / ~10 shared middles to 30 samples / 29 unique middles. All 6 rules in the lumen-argus community.json regression set now produce real, diverse coverage.

Root cause

rstr tags {m,n}?/*?/+? as min_repeat and greedy forms as max_repeat, then routes both through the same handler and draws the count uniformly from [m, n]. For (?s:.){0,200}? that produced ~100 random chars with 6% control-char bias (. samples from string.printable, including \v\x0c\n), blowing past max_string_length=256; the few survivors shared one degenerate middle and stage-2 padding fanned that single base into a pseudo-corpus.

Test plan

  • Unit tests: tests/test_generator.py::TestNonGreedyRepeat (4 new tests including the kubernetes_secret_yaml regression).
  • Full suite: 272/272 passing locally (py3.14), CI will run py3.12 + 3.13.
  • ruff check, ruff format --check, mypy clean.
  • End-to-end on all 6 community.json target rules: all 30/30 unique middles.

…racy

rstr routes min_repeat (non-greedy *?, {m,n}?) through the same
handler as max_repeat and samples uniformly over [m, n], dropping
non-greedy semantics. For wide holes like (?s:.){0,200}? this
generated ~100 random chars with 6% control-char bias; results
overshot max_string_length=256 and were filtered, the few survivors
shared one degenerate middle, stage-2 padding inflated that single
base. Patch _handle_state so min_repeat emits exactly start_range
repetitions. kubernetes_secret_yaml: 27 samples/~10 middles ->
30 samples/30 middles.
@slima4 slima4 merged commit 5390dd0 into main Apr 21, 2026
3 checks passed
@slima4 slima4 deleted the release/0.2.8 branch April 21, 2026 20:53
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant