Zero-shot sentiment classification for Nigerian Pidgin Tweets

This is a project on sentiment classification. A system description paper on the project, written by me and my professor, was published in the SemEval 2023 proceedings. Here is a link to the paper.

The aim of the project is to investigate how using different languages for pre-training and fine-tuning affects the performance of zero-shot sentiment analysis for Twitter data in Nigerian Pidgin.

Nigerian Pidgin is an English-based Creole language. As a proficient speaker of English, one can easily recognize some of the words:

Nigerian Pidgin is considered low-resource, which is why sentiment classification is still a challenging task for this language, despite being considered practically solved for high-resource languages such as English.

In this project, I use two pre-trained language models: bert-base-uncased (Devlin et al., 2019) and bert-base-multilingual-uncased (Devlin et al., 2019). I then fine-tuned the models separately on three different languages (or language pair in one case), to see how each language would affect the performance on sentiment classification in Nigerian Pidgin. The three languages are

English
Nigerian Pidgin
Igbo & Hausa

Note that the setup fine-tuning on Nigerian Pidgin is not zero-shot. It is included as a baseline. The data for Nigerian Pidgin, Igbo and Hausa is taken from Muhammad et al. (2023), while the data for English is provided by (Rosenthal et al., 2017). Igbo and Hausa are both Nigerian languages, but unlike English, they share no close linguistic ties with Nigerian Pidgin.

References

Shamsuddeen Hassan Muhammad, David Adelani, Sebastian Ruder, Ibrahim Sa’id Ahmad, Idris Abdulmumin, Shehu Bello Bello, Monojit Choudhury, Chris Chinenye Emezue, Saheed Salahuddeen Abdullahi, Anuoluwapo Aremu, Alipio Jeorge, and Pavel Brazdil. 2022. Naijasenti: A Nigerian Twitter Sentiment Corpus for Multilingual Sentiment Analysis. In Proceedings of the 13th Language Resources and Evaluation Conference, pages 590–602, Marseille, France. European Language Resources Association.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, pages 4171–4186.
Shamsuddeen Hassan Muhammad, Idris Abdulmumin, Abinew Ali Ayele, Nedjma Ousidhoum, David Ifeoluwa Adelani, Seid Muhie Yimam, Ibrahim Sa’id Ahmad, Meriem Beloucif, Saif M. Mohammad, Sebastian Ruder, Oumaima Hourrane, Pavel Brazdil, Felermino Dário Mário António Ali, Davis David, Salomey Osei, Bello Shehu Bello, Falalu Ibrahim, Tajuddeen Gwadabe, Samuel Rutunda, Tadesse Belay, Wendimu Baye Messelle, Hailu Beshada Balcha, Sisay Adugna Chala, Hagos Tesfahun Gebremichael, Bernard Opoku, and Steven Arthur. 2023. AfriSenti: A Twitter Sentiment Analysis Benchmark for African Languages.
Sara Rosenthal, Noura Farra, and Preslav Nakov. 2017. SemEval-2017 task 4: Sentiment analysis in Twitter. Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017), pages 502– 518.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.idea		.idea
images		images
README.md		README.md
acc_loss_plot.py		acc_loss_plot.py
check_for_duplicates.py		check_for_duplicates.py
confusion_matrix.py		confusion_matrix.py
eval_model.sh		eval_model.sh
label_distribution_per_language.py		label_distribution_per_language.py
merge_df_and_remove_duplicates.py		merge_df_and_remove_duplicates.py
pre_processing.py		pre_processing.py
predictions_for_shared_task.py		predictions_for_shared_task.py
raw_data.py		raw_data.py
recall_scores.py		recall_scores.py
train_model.py		train_model.py
train_model.sh		train_model.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Zero-shot sentiment classification for Nigerian Pidgin Tweets

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Zero-shot sentiment classification for Nigerian Pidgin Tweets

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages