Remove dynamic dispatch, reformulate TokenStream as Iterators#962
Remove dynamic dispatch, reformulate TokenStream as Iterators#962troublescooter wants to merge 20 commits intoquickwit-oss:mainfrom troublescooter:static
Conversation
|
Thanks. Give me a bit of time to review your work. |
|
I imported your branch in tantivy-search, and added a benchmark. I also imported your branch in tantivy-search, with the benchmark. Tokenizing alice in wonderlong is around 40% slower on your branch than in main. Feel free to investigate. I did not profile, but I was expecting a larger regression. Because you effectively emit the tokens, you need to allocate a new buffer for every token. |
|
About the removal of the internal static dispatch. tantivy used to do something like that. I modified the code to dynamic dispatch for all of the tokenfilters to simplify the code and avoid gigantic binaries. I never actually timed it though. |
This PR needs a
Cloneimplementation in the rust-stemmers crate.This changes the
TokenFilterandTokenizertraits to an associated types API and removes theTokenStreamtrait formulating what depended on it as iterators.The main idea is to take the struct
TextAnalyzerwhich contained the dynamically dispatched types and move the dynamic dispatch a layer up, now hiding the statically dispatched types behinddyn TextAnalyzerT(placeholder name).Can you benchmark these changes?