Skip to content

Add MariaDB full-text search provider#39

Merged
mhelleborg merged 2 commits into
mainfrom
claude/provider-mariadb-k9u9fc
Jul 2, 2026
Merged

Add MariaDB full-text search provider#39
mhelleborg merged 2 commits into
mainfrom
claude/provider-mariadb-k9u9fc

Conversation

@mhelleborg

Copy link
Copy Markdown
Owner

Implements the MariaDB provider — part of the multi-provider epic #23. Mirrors the Postgres provider and plugs into the shared SearchLite.Conformance suite. Chosen as one of the first new providers because its full-text search runs in the stock mariadb image with no custom infrastructure, so the full conformance suite runs in CI.

What's here

  • Source/SearchLite.MariaDbSearchManager + SearchIndex<T> on MySqlConnector. One InnoDB table per index (id VARCHAR(255) PK, document JSON, search_text TEXT, last_updated TIMESTAMP(6), utf8mb4) with a FULLTEXT(search_text) index. Full ISearchIndex<T> surface with INSERT … ON DUPLICATE KEY UPDATE upserts, batched IndexManyAsync, JSON_EXTRACT ordering, LIMIT/OFFSET paging.
  • WhereClauseBuilder<T> — every FilterNode<T> operator → MariaDB JSON predicates (JSON_UNQUOTE(JSON_EXTRACT(...)), CAST(… AS SIGNED/DECIMAL/DATETIME) for ranges, JSON_CONTAINS for equality + CollectionContains, LIKE for string ops, IN/NOT IN), mirroring the Postgres builder.
  • Tests/SearchLite.MariaDb.TestsTestcontainers.MariaDb (mariadb:11) fixture, the concrete conformance subclass, TableNameTests, WhereClauseTests.
  • Directory.Packages.props: MySqlConnector + Testcontainers.MariaDb. Both projects added to SearchLite.sln.

Full-text semantics (chosen to match the conformance suite)

  • Boolean mode (MATCH … AGAINST(? IN BOOLEAN MODE)), not natural-language mode — boolean mode has no 50%-of-rows term-ignore rule, which would otherwise break the suite's tiny (3-row) datasets.
  • Fixture sets innodb_ft_min_token_size=1 / ft_min_word_len=1 at startup (before any index is built) so the suite's short tokens (c, 1, doc) are indexed.
  • IncludePartialMatches=true → trailing * per term, OR-ed; false → required (+term) AND semantics.
  • Query strings are tokenized to alphanumeric/Unicode-letter terms before reaching the boolean parser, so operator chars (C#, SQL*, "Hello World") can't cause syntax errors.

Verification

  • Builds clean in Release across net8.0/net9.0/net10.0.
  • 26 infra-free unit tests pass (WhereClauseTests + TableNameTests).
  • The conformance suite runs via Testcontainers in CI (Docker isn't available locally, so it wasn't run here) and is auto-discovered by the parallel CI matrix.

Note

The provider-specific MinScore test asserts only the deterministic MinScore = 0.0 case, since MariaDB boolean-mode relevance isn't comparable to Postgres ts_rank — consistent with the cross-provider scoring caveat in epic #23.

🤖 Generated with Claude Code


Generated by Claude Code

claude added 2 commits June 27, 2026 23:19
Implements a new SearchLite provider backed by MariaDB, mirroring the
PostgreSQL provider's structure and the shared conformance test layout.

Source/SearchLite.MariaDb:
- SearchManager / SearchIndex implement the full ISearchEngineManager and
  ISearchIndex<T> surface using the MySqlConnector async driver.
- One InnoDB table per index (id VARCHAR(255) PK, document JSON, search_text
  TEXT, last_updated TIMESTAMP) with a FULLTEXT(search_text) index. Upserts
  use INSERT ... ON DUPLICATE KEY UPDATE.
- Full-text search uses MATCH(search_text) AGAINST(? IN BOOLEAN MODE) for the
  relevance score (boolean mode avoids natural-language mode's 50% rule, which
  breaks the suite's tiny datasets). User queries are tokenized to alphanumeric
  terms so operator characters cannot reach the parser; IncludePartialMatches
  appends a trailing '*' wildcard (OR semantics), otherwise each term is
  required ('+term', AND semantics).
- WhereClauseBuilder translates every FilterNode<T> operator into MariaDB JSON
  predicates (JSON_EXTRACT/JSON_UNQUOTE, JSON_CONTAINS for equality and
  collection containment), with ordering via JSON_EXTRACT and LIMIT/OFFSET
  paging.

Tests/SearchLite.MariaDb.Tests:
- MariaDbFixture (Testcontainers.MariaDb) starts the server with
  innodb_ft_min_token_size=1 and ft_min_word_len=1 so short tokens are indexed
  (the FULLTEXT index is created afterwards by the provider).
- Concrete IndexTests subclass runs the shared conformance suite, plus
  TableNameTests and WhereClauseTests (26 infra-free unit tests, all passing).

Central package versions for MySqlConnector and Testcontainers.MariaDb added to
Directory.Packages.props; both projects added to the solution.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AHZB8AqzqRcBEuzRurzJFf
Verified the full conformance suite against a real MariaDB 11 server
(142/142 passing). Fixes:

- Score read: the no-query path selected integer 0 as the score, so
  reader.GetDouble threw InvalidCastException on every filter-only search.
  Select CAST(0 AS DOUBLE) and read the score tolerantly.
- Null-aware scalar accessor: JSON_UNQUOTE(JSON_EXTRACT(...)) returns the
  literal text 'null' for a present-but-null field (breaking IS NULL /
  IsNullOrEmpty / ordering / nested null guards), while JSON_VALUE collapses
  "" to NULL and nulls out objects. Use a CASE on JSON_TYPE that yields SQL
  NULL only for JSON null / missing and the unquoted scalar (incl. "")
  otherwise — correct for scalars and for IS [NOT] NULL guards on objects.
- Emptiness via CHAR_LENGTH, not `= ''`: MySQL PAD SPACE collation treats
  "   " as equal to '', which made IsNullOrEmpty match whitespace.
- Type-aware ORDER BY: order on the scalar accessor (JSON null / missing
  sort as SQL NULL) and CAST numeric fields so they sort numerically.
- Full-text partial matches are term-level OR of exact tokens, not trailing
  '*' prefix matching (which wrongly matched "chars" for query "C#"),
  mirroring the SQLite/Postgres semantics.
- search_text is LONGTEXT, not TEXT: TEXT truncates at 64KB and dropped
  content (and FTS tokens) in the extremely-long-content case.
- Test fixture honors a SEARCHLITE_MARIADB_CONNSTR env var to run against an
  already-running server (no Docker); CI still uses Testcontainers.
- Updated WhereClauseTests SQL assertions for the new accessor.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01AHZB8AqzqRcBEuzRurzJFf
@mhelleborg mhelleborg merged commit 75b4c68 into main Jul 2, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants