Add MariaDB full-text search provider#39
Merged
Conversation
Implements a new SearchLite provider backed by MariaDB, mirroring the
PostgreSQL provider's structure and the shared conformance test layout.
Source/SearchLite.MariaDb:
- SearchManager / SearchIndex implement the full ISearchEngineManager and
ISearchIndex<T> surface using the MySqlConnector async driver.
- One InnoDB table per index (id VARCHAR(255) PK, document JSON, search_text
TEXT, last_updated TIMESTAMP) with a FULLTEXT(search_text) index. Upserts
use INSERT ... ON DUPLICATE KEY UPDATE.
- Full-text search uses MATCH(search_text) AGAINST(? IN BOOLEAN MODE) for the
relevance score (boolean mode avoids natural-language mode's 50% rule, which
breaks the suite's tiny datasets). User queries are tokenized to alphanumeric
terms so operator characters cannot reach the parser; IncludePartialMatches
appends a trailing '*' wildcard (OR semantics), otherwise each term is
required ('+term', AND semantics).
- WhereClauseBuilder translates every FilterNode<T> operator into MariaDB JSON
predicates (JSON_EXTRACT/JSON_UNQUOTE, JSON_CONTAINS for equality and
collection containment), with ordering via JSON_EXTRACT and LIMIT/OFFSET
paging.
Tests/SearchLite.MariaDb.Tests:
- MariaDbFixture (Testcontainers.MariaDb) starts the server with
innodb_ft_min_token_size=1 and ft_min_word_len=1 so short tokens are indexed
(the FULLTEXT index is created afterwards by the provider).
- Concrete IndexTests subclass runs the shared conformance suite, plus
TableNameTests and WhereClauseTests (26 infra-free unit tests, all passing).
Central package versions for MySqlConnector and Testcontainers.MariaDb added to
Directory.Packages.props; both projects added to the solution.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AHZB8AqzqRcBEuzRurzJFf
Verified the full conformance suite against a real MariaDB 11 server (142/142 passing). Fixes: - Score read: the no-query path selected integer 0 as the score, so reader.GetDouble threw InvalidCastException on every filter-only search. Select CAST(0 AS DOUBLE) and read the score tolerantly. - Null-aware scalar accessor: JSON_UNQUOTE(JSON_EXTRACT(...)) returns the literal text 'null' for a present-but-null field (breaking IS NULL / IsNullOrEmpty / ordering / nested null guards), while JSON_VALUE collapses "" to NULL and nulls out objects. Use a CASE on JSON_TYPE that yields SQL NULL only for JSON null / missing and the unquoted scalar (incl. "") otherwise — correct for scalars and for IS [NOT] NULL guards on objects. - Emptiness via CHAR_LENGTH, not `= ''`: MySQL PAD SPACE collation treats " " as equal to '', which made IsNullOrEmpty match whitespace. - Type-aware ORDER BY: order on the scalar accessor (JSON null / missing sort as SQL NULL) and CAST numeric fields so they sort numerically. - Full-text partial matches are term-level OR of exact tokens, not trailing '*' prefix matching (which wrongly matched "chars" for query "C#"), mirroring the SQLite/Postgres semantics. - search_text is LONGTEXT, not TEXT: TEXT truncates at 64KB and dropped content (and FTS tokens) in the extremely-long-content case. - Test fixture honors a SEARCHLITE_MARIADB_CONNSTR env var to run against an already-running server (no Docker); CI still uses Testcontainers. - Updated WhereClauseTests SQL assertions for the new accessor. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01AHZB8AqzqRcBEuzRurzJFf
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements the MariaDB provider — part of the multi-provider epic #23. Mirrors the Postgres provider and plugs into the shared
SearchLite.Conformancesuite. Chosen as one of the first new providers because its full-text search runs in the stockmariadbimage with no custom infrastructure, so the full conformance suite runs in CI.What's here
Source/SearchLite.MariaDb—SearchManager+SearchIndex<T>onMySqlConnector. One InnoDB table per index (id VARCHAR(255)PK,document JSON,search_text TEXT,last_updated TIMESTAMP(6), utf8mb4) with aFULLTEXT(search_text)index. FullISearchIndex<T>surface withINSERT … ON DUPLICATE KEY UPDATEupserts, batchedIndexManyAsync,JSON_EXTRACTordering,LIMIT/OFFSETpaging.WhereClauseBuilder<T>— everyFilterNode<T>operator → MariaDB JSON predicates (JSON_UNQUOTE(JSON_EXTRACT(...)),CAST(… AS SIGNED/DECIMAL/DATETIME)for ranges,JSON_CONTAINSfor equality +CollectionContains,LIKEfor string ops,IN/NOT IN), mirroring the Postgres builder.Tests/SearchLite.MariaDb.Tests—Testcontainers.MariaDb(mariadb:11) fixture, the concrete conformance subclass,TableNameTests,WhereClauseTests.Directory.Packages.props:MySqlConnector+Testcontainers.MariaDb. Both projects added toSearchLite.sln.Full-text semantics (chosen to match the conformance suite)
MATCH … AGAINST(? IN BOOLEAN MODE)), not natural-language mode — boolean mode has no 50%-of-rows term-ignore rule, which would otherwise break the suite's tiny (3-row) datasets.innodb_ft_min_token_size=1/ft_min_word_len=1at startup (before any index is built) so the suite's short tokens (c,1,doc) are indexed.IncludePartialMatches=true→ trailing*per term, OR-ed;false→ required (+term) AND semantics.C#,SQL*,"Hello World") can't cause syntax errors.Verification
WhereClauseTests+TableNameTests).Note
The provider-specific
MinScoretest asserts only the deterministicMinScore = 0.0case, since MariaDB boolean-mode relevance isn't comparable to Postgrests_rank— consistent with the cross-provider scoring caveat in epic #23.🤖 Generated with Claude Code
Generated by Claude Code