### Motivation this model is huge, why not using a relatively smaller text as encoder ### Related resources _No response_