Automating Second Language Learner Error Analysis

UBC BREB: H24-00741
Ethics approval certificate is attached.

Files in this repo:

Note: files containing participant data will not be available on Github and instead submitted on Canvas.

finetuning.py
This is a copy of the notebook that finetunes the LLM. I used Google Colab for this file because I needed the GPUs. The final model was uploaded here to HuggingFace
happy.py
The happy file formats the original essays to be properly formatted and spaced, then calls an instance of the model from HuggingFace to correct them.
parse.ipynb
parse.ipynb counts error type frequencies from M2 file formats and attaches them to the respective participant.
graphs.ipynb
This plots comparisons of the results from results.csv and true_results.csv

Files submitted on Canvas

Participants are represented by a code, which matches them across these documents.

original_essays
These are the original essays from participants with errors as-is, including spacing.
spaced_formatted_essays
These are the formatted original essays output by happy.py
hand_corrected_m2
These are the manual annotations of the original essays, in M2 format.
model_corrected_txt
These are the corrected versions of the essays output by the model in happy.py in text format.
model_corrected_m2
These are the model annotations of the original essays by ERRANT, in M2 format. These were generated from the original and the text model-corrected essays.
scores
This contains the precision, recall, and F0.5 scores for each essay alongside overall averages. It also contains the bash commands used to activate the errant env, evaluate the model's performance against the manual annotations, and create the files in model_corrected_m2.
participant_data
This provides information on the participants who are represented by a code.
model_results
This displays the error frequency table created by parse.ipynb for model_corrected_m2
manual_results
This displays the error frequency table created by parse.ipynb for manual_corrected_m2

ERRANT folder

This is a clone of the ERRANT repo from the papers below, found here.

Christopher Bryant, Mariano Felice, and Ted Briscoe. 2017. Automatic annotation and evaluation of error types for grammatical error correction. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Vancouver, Canada.

Mariano Felice, Christopher Bryant, and Ted Briscoe. 2016. Automatic extraction of learner errors in ESL sentences using linguistically enhanced alignments. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. Osaka, Japan.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
ApprovalCertificate.pdf		ApprovalCertificate.pdf
README.md		README.md
finetuning.ipynb		finetuning.ipynb
graphs.ipynb		graphs.ipynb
happy.py		happy.py
parse.ipynb		parse.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automating Second Language Learner Error Analysis

Files in this repo:

Files submitted on Canvas

ERRANT folder

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automating Second Language Learner Error Analysis

Files in this repo:

Files submitted on Canvas

ERRANT folder

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages