Feature Analysis for Native Language Identification

被引:5
|
作者
Nisioi, Sergiu [1 ,2 ]
机构
[1] Univ Bucharest, Ctr Computat Linguist, Bucharest, Romania
[2] Oracle RightNow, Bucharest, Romania
关键词
CLASSIFICATION;
D O I
10.1007/978-3-319-18111-0_49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study we investigate the role of different features for the task of native language identification. For this purpose, we compile a learner corpus based on a subset of the EF Cambridge Open Language Database - EFCAMDAT [10] developed at the University of Cambridge in collaboration with EF Education. The features we are taking into consideration include character n-grams, positional token frequencies, part of speech n-grams, function words, shell nouns and a set of annotated errors. Last but not least, we examine whether the essays of English learners that share the same mother tongue can be distinguished based on their country of origin.
引用
收藏
页码:644 / 657
页数:14
相关论文
共 50 条
  • [1] Exploiting native language interference for native language identification
    Markov, Ilia
    Nastase, Vivi
    Strapparava, Carlo
    [J]. NATURAL LANGUAGE ENGINEERING, 2022, 28 (02) : 167 - 197
  • [2] Automatic Native Language Identification Using Novel Acoustic and Prosodic Feature Selection Strategies
    Yarra, Chiranjeevi
    Rao, Achuth M., V
    Ghosh, Prasanta Kumar
    [J]. IEEE INDICON: 15TH IEEE INDIA COUNCIL INTERNATIONAL CONFERENCE, 2018,
  • [3] Portuguese Native Language Identification
    Malmasi, Shervin
    del Rio, Iria
    Zampier, Marcos
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 115 - 124
  • [4] Multilingual native language identification
    Malmasi, Shervin
    Dras, Mark
    [J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (02) : 163 - 215
  • [5] Bridging the Native Language and Language Variety Identification Tasks
    Franco-Salvador, Marc
    Kondrak, Greg
    Rosso, Paolo
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 1554 - 1561
  • [6] Feature Hashing for Language and Dialect Identification
    Malmasi, Shervin
    Dras, Mark
    [J]. PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 399 - 403
  • [7] DISCRIMINATIVE FEATURE EXTRACTION FOR LANGUAGE IDENTIFICATION
    Huang, Shuai
    Coppersmith, Glen A.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 6862 - 6865
  • [8] Native Language Identification with User Generated Content
    Goldin, Gili
    Rabinovich, Ella
    Wintner, Shuly
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3591 - 3601
  • [9] Multitask deep learning for native language identification
    Habic, Vuk
    Semenov, Alexander
    Pasiliao, Eduardo L.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 209
  • [10] Native Language Identification With Classifier Stacking and Ensembles
    Malmasi, Shervin
    Dras, Mark
    [J]. COMPUTATIONAL LINGUISTICS, 2018, 44 (03) : 403 - 446