Skip to content

fix: Limit relevant fields in default wildcard search#768

Open
LarsV123 wants to merge 2 commits intomainfrom
larsv/2026/03/18-fix-clause-count-issue
Open

fix: Limit relevant fields in default wildcard search#768
LarsV123 wants to merge 2 commits intomainfrom
larsv/2026/03/18-fix-clause-count-issue

Conversation

@LarsV123
Copy link
Contributor

@LarsV123 LarsV123 commented Mar 18, 2026

NOTE: This is a possible fix to address the issue described in https://sikt.atlassian.net/browse/NP-50823. Because it changes default behavior significantly, it will likely break something unexpected and should be tested carefully.

Problem

The CROSS_FIELDS multi-match query in searchAllWithBoostsQuery used "*" (wildcard) for the default field set, which OpenSearch expands to every field in the index (~433 text fields in prod). Combined with Operator.AND, this creates tokens × fields boolean clauses.

The .limit(7) caps space-separated words, but the standard tokenizer also splits on hyphens — so a query like "Genome-wide association meta-analysis..." (7 words) produces 9+ tokens. With ~433 fields, that's ~3900–4300+ clauses, exceeding the maxClauseCount of 4096. Production has more dynamically-mapped fields than test, which is why the same queries work in test but fail in prod.

Fix

Replaced the "*" wildcard with an explicit DEFAULT_SEARCH_ALL_FIELDS map containing 15 curated fields that are meaningful for free-text publication search:

  • Title (boosted to PI), abstract, tags
  • Identifiers (identifier, publisher ID, DOI)
  • Contributor names
  • Journal/publisher names
  • Affiliation labels (Norwegian, English, Nynorsk — text fields for proper analyzer support)
  • Funding identifiers

Clause count: 15 fields × ~10 tokens = ~150 clauses (vs 4000+ before), well within the 4096 limit — even with hyphenated queries.

When users explicitly specify the fields parameter (NODES_SEARCHED), the behavior is unchanged — only the default "search all" case is affected.

Alternative: copy_to field

If the curated field list turns out to be too restrictive, a more robust long-term option is to add a copy_to directive in the index mapping that copies all searchable fields into a single combined text field (e.g. _search_all). The multi-match query would then target just 1 field instead of N, eliminating the clause explosion entirely while still supporting true all-field search. The trade-off is that it requires a mapping change and full re-index.

@github-actions
Copy link

github-actions bot commented Mar 18, 2026

Test Results

   43 files  ±0     43 suites  ±0   2m 52s ⏱️ +28s
1 082 tests ±0  1 079 ✅ ±0  3 💤 ±0  0 ❌ ±0 
1 165 runs  ±0  1 162 ✅ ±0  3 💤 ±0  0 ❌ ±0 

Results for commit e0680a9. ± Comparison against base commit d389989.

This pull request removes 5 and adds 4 tests. Note that renamed tests count towards both.

no.unit.nva.indexingclient.models.IndexDocumentTest ‑ should throw exception when validating and missing mandatory fields:IndexDocument[consumptionAttributes=EventConsumptionAttributes[index=RiRo0cAjkw1phZKdQRh, documentIdentifier=null], resource={"twwQCSSTbo":{"iOEmReH7O34y":"xtlW0xhdlTqX6vN2Qn","dwdAepKgirrt":"GYMCcEcwxE5kpUqe","F8exwJV4wMvQvo":"D4W8q7Gkn6quvy1t","5DAkS3H3pWDlhIDa":"4aPqclfgzFon","g3yAc9FtFfnqqDm":"qyunYN3ZmiMw"},"mg1b6bdpm7C6SdHYo61":{"7pvtkV9dUwT5y0":"UlLUsrFA2wKws452fM","MwCQDlHvb5fCLDcjZ":"IZJagLVI0Uen","yPPgkopN41wD":"Hij1APg2a775OQntlSp","aMTnhfr3p3pmmow":"cpbcRnbX5Yef","wGyuW20MT4":"bYoNQW8eRweZ"},"P…
no.unit.nva.indexingclient.models.IndexDocumentTest ‑ should throw exception when validating and missing mandatory fields:IndexDocument[consumptionAttributes=EventConsumptionAttributes[index=null, documentIdentifier=019cfab99532-a13e1ff3-53e6-4f45-be65-17c95161f946], resource={"J49Ha19N6aH5D3Z":{"yrwjVBXLA0nmaNykfTE":"xrds1T4jKLtpHP74Ho","zYDBAprz5s5I3":"bQPidZmVIo10u","te0yDp0ja3ou":"4d76SeHjuMyeiVudGFE","Wkjb0MzDpP3XZtVb49":"KPCixq2VBgK64P","FpCQZSi9tAIdPPov":"NgTJtM01gl6"},"05VzfquETdABDp":{"CiuogEZcvXAHalHj":"Lqyl13Wntd","SkaJQHSu2iGP":"2l31YuG29Ah2wOI","m0fVuHs03ds":"CdM9yzK0Fxxrr7","ph532adp1Jtld2BrPN":"5ymdQ9M95ok","Mi…
no.unit.nva.search.resource.ResourceClientAllScientificValuesTest ‑ [1] { "onlineIssn": "1903-6523" }
no.unit.nva.search.resource.ResourceClientAllScientificValuesTest ‑ [2] { "printIssn": "1903-6523" }
no.unit.nva.indexingclient.models.IndexDocumentTest ‑ should throw exception when validating and missing mandatory fields:IndexDocument[consumptionAttributes=EventConsumptionAttributes[index=Vr2yCMA4c0Xu, documentIdentifier=null], resource={"EjlRx6YliuwM80wUW":{"R5wOG9QmdQT9":"L2VtIgZRUY9k","UehlDqseP0bTDmoPfw":"fpXoPfQrylnt9N","ZNizEjaDNATNhBPWZ":"yyh26DgpEMTg4pbO4u","qKAFU4HoMo2Izc":"0xmC9IvFDbO","TAXRhEq9OHkKIRVBh":"PsJJ4ud1PQ2Yqncp1"},"YEdE1xwV2Bm6":{"SaL4FRQwW2m":"YEvvAekojCRgPC8J","qJjmnwqg0Zc6":"ULzt8h4kxYj64L6n","LHUTm0RZAmW3V":"WSooru0HVzJ6","g9A07iSCjOYDZS":"ZVdyVkaSsUdvPEffVeA","GIgjG1fd7sI2VHM1Mn":"vv7wMtKNQgOb"},…
no.unit.nva.indexingclient.models.IndexDocumentTest ‑ should throw exception when validating and missing mandatory fields:IndexDocument[consumptionAttributes=EventConsumptionAttributes[index=null, documentIdentifier=019d00bd6ce3-05dd7f4e-59e6-4a5d-9d81-80d998713860], resource={"d7EIUtPDj4NZFM47sx":{"zpnz1H762xDtNlWU":"tq4F70yzFZzwa94btw","6BQBsXRNx2yt53r":"jFZI8JMbJUm","FFhrYyeYgg1I0kCE":"cIMRYfnkTi4isJ26N","ZPNfl7V6xcdzUR":"vwJ0Kzo7XMSV","PEKGDIX0Eu":"iCYWquHLD7qbFn"},"TyIWZquKrW69vfZg":{"TuOb2xair50n1XPw":"VbmgY8chpwQI2wV8RBs","jf5lZHxJrkL":"JO9KMjfSRz7XpjpE","a99rUVOsswVh":"tRNpsDETGgSCq","eykcOmjzHpnBZUSr":"TF1lC1wgBWmBq5…
no.unit.nva.search.resource.ResourceClientAllScientificValuesTest ‑ [1] { "onlineIssn": "1903-6523" }

no.unit.nva.search.resource.ResourceClientAllScientificValuesTest ‑ [2] { "printIssn": "1903-6523" }

♻️ This comment has been updated with latest results.

@LarsV123 LarsV123 marked this pull request as ready for review March 19, 2026 09:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant