Switch to XLM-RoBERTa for Asian language support

## Motivation

To improve support for Asian languages, we should switch our current model to [XLM-RoBERTa](https://huggingface.co/docs/transformers/en/model_doc/xlm-roberta).

XLM-RoBERTa is a multilingual model pre-trained on 100 languages, including a wide range of Asian languages (Chinese, Japanese, Korean, Thai, Vietnamese, Hindi, etc.), and consistently outperforms multilingual BERT (mBERT) on cross-lingual benchmarks.

## Proposal

- Evaluate XLM-RoBERTa (`xlm-roberta-base` and/or `xlm-roberta-large`) as a replacement for the current model.
- Benchmark performance on Asian language inputs against the current setup.
- Update the model loading / inference code paths to support XLM-RoBERTa.
- Update documentation and any related configs.

## References

- Hugging Face docs: https://huggingface.co/docs/transformers/en/model_doc/xlm-roberta

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to XLM-RoBERTa for Asian language support #457

Motivation

Proposal

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Switch to XLM-RoBERTa for Asian language support #457

Description

Motivation

Proposal

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions