Neural Grammatical Error Correction for Romanian

被引:5
|
作者
Cotet, Teodor-Mihai [1 ]
Ruseti, Stefan [1 ]
Dascalu, Mihai [1 ]
机构
[1] Univ Politehn Bucuresti, Comp Sci Dept, Bucharest, Romania
关键词
Grammatical Error Correction; Transformer; Romanian; low-resource language; ERRANT;
D O I
10.1109/ICTAI50040.2020.00101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Resources for Grammatical Error Correction (GEC) in non-English languages are scarce, while available spellcheckers in these languages are mostly limited to simple corrections and rules. In this paper we introduce a first GEC corpus for Romanian consisting of 10k pairs of sentences. In addition, the German version of ERRANT (ERRor ANnotation Toolkit) scorer was adapted for Romanian to analyze this corpus and extract edits needed for evaluation. Multiple neural models were experimented, together with pretraining strategies, which proved effective for GEC in low-resource settings. Our baseline consists of a small Transformer model trained only on the GEC dataset (F-0.5 = 44.38), whereas the best performing model is produced by pretraining a larger Transformer model on artificially generated data, followed by finetuning on the actual corpus (F-0.5 = 53.76). The proposed method for generating additional training examples is easily extensible and can be applied to any language, as it requires only a POS tagger.
引用
收藏
页码:625 / 631
页数:7
相关论文
共 50 条
  • [1] Neural Quality Estimation of Grammatical Error Correction
    Chollampatt, Shamil
    Ng, Hwee Tou
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 2528 - 2539
  • [2] Neural Grammatical Error Correction with Finite State Transducers
    Stahlberg, Felix
    Bryant, Christopher
    Byrne, Bill
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4033 - 4039
  • [3] Chinese Grammatical Error Correction Using Statistical and Neural Models
    Zhou, Junpei
    Li, Chen
    Liu, Hengyou
    Bao, Zuyi
    Xu, Guangwei
    Li, Linlin
    [J]. NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 117 - 128
  • [4] Toward perfect neural cascading architecture for grammatical error correction
    Acheampong, Kingsley Nketia
    Tian, Wenhong
    [J]. APPLIED INTELLIGENCE, 2021, 51 (06) : 3775 - 3788
  • [5] Neural and FST-based approaches to grammatical error correction
    Yuan, Zheng
    Stahlberg, Felix
    Rei, Marek
    Byrne, Bill
    Yannakoudakis, Helen
    [J]. INNOVATIVE USE OF NLP FOR BUILDING EDUCATIONAL APPLICATIONS, 2019, : 228 - 239
  • [6] Fluency Boost Learning and Inference for Neural Grammatical Error Correction
    Ge, Tao
    Wei, Furu
    Zhou, Ming
    [J]. PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1055 - 1065
  • [7] Toward perfect neural cascading architecture for grammatical error correction
    Kingsley Nketia Acheampong
    Wenhong Tian
    [J]. Applied Intelligence, 2021, 51 : 3775 - 3788
  • [8] Comparison of the Evaluation Metrics for Neural Grammatical Error Correction With Overcorrection
    Park, Chanjun
    Yang, Yeongwook
    Lee, Chanhee
    Lim, Heuiseok
    [J]. IEEE ACCESS, 2020, 8 : 106264 - 106272
  • [9] Neural Quality Estimation with Multiple Hypotheses for Grammatical Error Correction
    Liu, Zhenghao
    Yi, Xiaoyuan
    Sun, Maosong
    Yang, Liner
    Chua, Tat-Seng
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 5441 - 5452
  • [10] A Nested Attention Neural Hybrid Model for Grammatical Error Correction
    Ji, Jianshu
    Wang, Qinlong
    Toutanova, Kristina
    Gong, Yongen
    Truong, Steven
    Gao, Jianfeng
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 753 - 762