Release 0.2.8: respect non-greedy in rstr, fixes kubernetes_secret_yaml degeneracy#16
Merged
Conversation
…racy
rstr routes min_repeat (non-greedy *?, {m,n}?) through the same
handler as max_repeat and samples uniformly over [m, n], dropping
non-greedy semantics. For wide holes like (?s:.){0,200}? this
generated ~100 random chars with 6% control-char bias; results
overshot max_string_length=256 and were filtered, the few survivors
shared one degenerate middle, stage-2 padding inflated that single
base. Patch _handle_state so min_repeat emits exactly start_range
repetitions. kubernetes_secret_yaml: 27 samples/~10 middles ->
30 samples/30 middles.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
kubernetes_secret_yamldegenerate samples).rstr.xeger.Xeger._handle_statesomin_repeatemits exactlystart_rangerepetitions — the semantically correct minimum for non-greedy quantifiers, matching how the re engine actually fills those regions against real text.kubernetes_secret_yamlgoes from 27 samples / ~10 shared middles to 30 samples / 29 unique middles. All 6 rules in the lumen-argus community.json regression set now produce real, diverse coverage.Root cause
rstr tags
{m,n}?/*?/+?asmin_repeatand greedy forms asmax_repeat, then routes both through the same handler and draws the count uniformly from[m, n]. For(?s:.){0,200}?that produced ~100 random chars with 6% control-char bias (.samples fromstring.printable, including\v\x0c\n), blowing pastmax_string_length=256; the few survivors shared one degenerate middle and stage-2 padding fanned that single base into a pseudo-corpus.Test plan
tests/test_generator.py::TestNonGreedyRepeat(4 new tests including the kubernetes_secret_yaml regression).ruff check,ruff format --check,mypyclean.