Creating and evaluating Grammatical Error Correction models with arbitrary error correction profiles

Author(s): Klampe, T. (2023)

Abstract:
Generating high-quality synthetic datasets with use case specific error frequencies can boost the performance of Grammatical Error Correction models substantially. In this paper, I propose a system in which datasets are created according to specific error frequencies with a tagged grammatical corruption model. The effect of these frequencies is then evaluated in error-specific accuracy testing. The system can be used to flexibly generate synthetic datasets and then train a grammatical error correction model. The accuracy of said model is analyzed and then can be iteratively improved by changing error frequencies in the dataset and comparing the effects on the accuracy. I will demonstrate the generation and evaluation of a grammatical error correction model that takes the expected error profile of a native English speaker into consideration.

Document(s):

Klampe_BA_EEMCS.pdf