Add regexp raw pattern API#1061
Conversation
keegancsmith
left a comment
There was a problem hiding this comment.
I'm a bit unsure about this, so I haven't deeply reviewed the regexp.go code change. As such I am requesting changes. If you are keen on this direction then re request review and I will review properly.
internal/syntaxutil is a part of the stdlib's regexp/syntax code with a commit reverted that gave us a performance problem. See README.md from internal/syntaxutil
We shouldn't just be modifying it. If we do we need to use a pattern which makes it clear how we changed it. EG put the new functions in a different file + include comments in the original file indicating changes. Finally you also need to update the README.
Why do you need to make this change out of interest. In the original thread about this issue I noted we had OOMs in Sourcegraph due to not using this function. Did you find even after using this function we still had OOMs and needed to "unsimplify".
How did ya test this. Did you have a reproduction before and after? In particular I wonder if our issue is infact that Simplify does a simplification we don't like / we don't think makes sense. Is the issue then not in simplify? EG we have other bits of code which take syntax.Regexp and construct matchtrees. I worry those bad patterns affect that code path as well.
| } | ||
| } | ||
|
|
||
| func TestRegexpStringCompactOptionalRepeatNonGreedy(t *testing.T) { |
There was a problem hiding this comment.
This test and the above are so similiar it makes it hard to tell the difference. Can you maybe add a table test which tests what happens after simplify? Or maybe in the below table test you have want and wantSimplify?
| } | ||
|
|
||
| // RegexpString returns the marshaled raw regexp pattern for q. | ||
| func (q *Regexp) RegexpString() string { |
There was a problem hiding this comment.
This change makes sense no matter what. We want a way to get at this. I do think in Sourcegraph though we want to be able to directly use syntaxutil.RegexpString. We had a need to marshal the string for observability, which is outside of the need of marshalling a query.Regexp
| }{ | ||
| { | ||
| name: "simple literal", | ||
| q: &Regexp{Regexp: mustParseRE(`abc`)}, |
There was a problem hiding this comment.
minor. your input table should just be the regex string. The extra noise of constructing regexp here makes the case harder to read. Also the subtest name seems not that useful over the actual input. Just include the input in the t.Fatal
| if got := tt.q.RegexpString(); got != tt.want { | ||
| t.Fatalf("RegexpString() = %q, want %q", got, tt.want) | ||
| } | ||
| if got, want := tt.q.String(), `regex:"`+tt.want+`"`; got != want { |
There was a problem hiding this comment.
low value assertation, just focus on regexpstring
| } | ||
| } | ||
|
|
||
| func TestRegexpStringIsRawPattern(t *testing.T) { |
| } | ||
| } | ||
|
|
||
| func TestRegexpStringBoundedRepeatCompiles(t *testing.T) { |
There was a problem hiding this comment.
Ok so this test better demonstrates the fix you are adding. Are you sure the better fix isn't "undoing" the optimize step which creates the large pattern? Or does that have potential perf impacts I suppose and we should just focus on marshalling?
eac2b70 to
37db178
Compare
|
@keegancsmith Thanks for the review. I scaled this back to the minimal Zoekt change. |
Some downstream callers need to marshal or recompile query regexps and were reaching into
q.Regexp.String(), which is the underlying syntax tree debug string rather than an official query API.This adds
query.Regexp.RegexpString()as the public API for getting the marshaled raw regexp pattern. Existingquery.Regexpstring and gob serialization paths now route through that method.query.Regexp.String()remains query/debug formatting and still includes wrappers likeregex:"...".