test(store): cover normalizeFTSQuery edge cases and document FTS5 tokenizer#45
Open
mvanhorn wants to merge 1 commit intosteipete:mainfrom
Open
test(store): cover normalizeFTSQuery edge cases and document FTS5 tokenizer#45mvanhorn wants to merge 1 commit intosteipete:mainfrom
mvanhorn wants to merge 1 commit intosteipete:mainfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds focused unit and end-to-end tests for
normalizeFTSQueryand a one-line doc comment above each FTS5create virtual tablesite noting the defaultunicode61tokenizer and the input-normalization contract.Why this matters
Issue #9 raised three asks: parameterize FTS queries, document the tokenizer choice, and add edge-case tests. Two of those are already in place:
internal/store/query.go:75-76,query.go:427,members_profile.go:138-139all pass user input viamatch ?.normalizeFTSQueryatinternal/store/query.go:893, which wraps each whitespace-separated field in double quotes after stripping inner quotes.AND,OR,NOT,NEAR, and*become literal terms rather than FTS5 syntax.What was missing:
normalizeFTSQuery. The closest coverage instore_test.go(TestSearchFallbackFilters,TestStoreReadWriteAndSearch) only uses simple non-operator queries, so operator-as-literal behavior was not asserted.unicode61) is implicit. A future contributor inspecting the schema would have to consult SQLite docs to see what tokenization rules apply.This PR locks in the existing sanitizer with regression tests and documents the tokenizer choice. The framing matches the precedent set by #3, which was closed with "that should be a narrower regression-test issue rather than this broad one."
Changes
internal/store/store_test.go: appendsTestNormalizeFTSQueryEdgeCases(table-driven, covers empty/whitespace, single/multi-word,AND/OR/NOT/NEARas terms, embedded double-quotes,*as literal, mixed punctuation, unicode) andTestSearchMessagesTreatsFTSOperatorsAsLiterals(end-to-end, queries"AND"and asserts only messages whose content contains the token match, not the FTS5 boolean).internal/store/store.go(lines ~394 and ~575) andinternal/store/members_profile.go(line ~60): one-line comment above eachcreate virtual table ... using fts5(...)block noting the defaultunicode61tokenizer and pointing readers atnormalizeFTSQuery.No change to
normalizeFTSQuery, no change to the FTS5 schema (notokenize=clause added), no change to any caller.Testing
Local CI gate, all green:
gofumpt -l .(clean)go vet ./...staticcheck ./...golangci-lint rungosec -exclude=G101,G115,G202,G301,G304 ./...go test -count=1 ./...go test -count=1 -race ./internal/store/...Diff: 3 files changed, +92 / -0.
Fixes #9