Skip to content

Remove dynamic dispatch, reformulate TokenStream as Iterators#962

Closed
troublescooter wants to merge 20 commits intoquickwit-oss:mainfrom
troublescooter:static
Closed

Remove dynamic dispatch, reformulate TokenStream as Iterators#962
troublescooter wants to merge 20 commits intoquickwit-oss:mainfrom
troublescooter:static

Conversation

@troublescooter
Copy link

This PR needs a Clone implementation in the rust-stemmers crate.

This changes the TokenFilter and Tokenizer traits to an associated types API and removes the TokenStream trait formulating what depended on it as iterators.

The main idea is to take the struct TextAnalyzer which contained the dynamically dispatched types and move the dynamic dispatch a layer up, now hiding the statically dispatched types behind dyn TextAnalyzerT (placeholder name).

Can you benchmark these changes?

@troublescooter
Copy link
Author

@fulmicoton
Copy link
Collaborator

Thanks. Give me a bit of time to review your work.

@fulmicoton
Copy link
Collaborator

I imported your branch in tantivy-search, and added a benchmark.

I also imported your branch in tantivy-search, with the benchmark.
You can check it here. #967

Tokenizing alice in wonderlong is around 40% slower on your branch than in main.
main - 1.4ms
after-your-change: 2ms

Feel free to investigate. I did not profile, but I was expecting a larger regression.
In main, there are way less allocation, since I reuse keep copying the data in the same Token's String buffer.

Because you effectively emit the tokens, you need to allocate a new buffer for every token.

@fulmicoton
Copy link
Collaborator

About the removal of the internal static dispatch. tantivy used to do something like that.
811fd0c#diff-54c918ab79693b0b4c4573f4cb7d8c1e21e21e913b38100fa26ed09989589c21

I modified the code to dynamic dispatch for all of the tokenfilters to simplify the code and avoid gigantic binaries. I never actually timed it though.

@fulmicoton fulmicoton closed this May 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants