Feature Analysis for Native Language Identification

被引:5
|
作者
Nisioi, Sergiu [1 ,2 ]
机构
[1] Univ Bucharest, Ctr Computat Linguist, Bucharest, Romania
[2] Oracle RightNow, Bucharest, Romania
关键词
CLASSIFICATION;
D O I
10.1007/978-3-319-18111-0_49
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study we investigate the role of different features for the task of native language identification. For this purpose, we compile a learner corpus based on a subset of the EF Cambridge Open Language Database - EFCAMDAT [10] developed at the University of Cambridge in collaboration with EF Education. The features we are taking into consideration include character n-grams, positional token frequencies, part of speech n-grams, function words, shell nouns and a set of annotated errors. Last but not least, we examine whether the essays of English learners that share the same mother tongue can be distinguished based on their country of origin.
引用
收藏
页码:644 / 657
页数:14
相关论文
共 50 条
  • [31] A Feature Normalisation Technique for PLLR based Language Identification Systems
    Fernando, Sarith
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2925 - 2929
  • [32] Feature Relevance Analysis for Writer Identification
    Siddiqi, Imran
    Khurshid, Khurram
    Vincent, Nicole
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XVIII, 2011, 7874
  • [33] CircRNA identification and feature interpretability analysis
    Mengting Niu
    Chunyu Wang
    Yaojia Chen
    Quan Zou
    Ren Qi
    Lei Xu
    [J]. BMC Biology, 22
  • [34] Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages
    Farah Adeeba
    Sarmad Hussain
    [J]. Circuits, Systems, and Signal Processing, 2018, 37 : 3589 - 3604
  • [35] Language feature mining for document subjectivity analysis
    Chen, Bo
    He, Hui
    Guo, Jun
    [J]. PROCEEDINGS OF THE FIRST INTERNATIONAL SYMPOSIUM ON DATA, PRIVACY, AND E-COMMERCE, 2007, : 62 - 67
  • [36] CircRNA identification and feature interpretability analysis
    Niu, Mengting
    Wang, Chunyu
    Chen, Yaojia
    Zou, Quan
    Qi, Ren
    Xu, Lei
    [J]. BMC BIOLOGY, 2024, 22 (01)
  • [37] Acoustic Feature Analysis and Discriminative Modeling for Language Identification of Closely Related South-Asian Languages
    Adeeba, Farah
    Hussain, Sarmad
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2018, 37 (08) : 3589 - 3604
  • [38] The catchment feature model for multimodal language analysis
    Quek, F
    [J]. NINTH IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS I AND II, PROCEEDINGS, 2003, : 540 - 547
  • [39] Redundant Feature Identification and Redundancy Analysis for Causal Feature Selection
    Limshuebchuey, Asavaron
    Duangsoithong, Rakkrit
    Windeatt, Terry
    [J]. 2015 8TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON), 2015,
  • [40] Native and non-native teachers in a minority language: An analysis of stakeholders' opinions
    Colmenero, Kebir
    Lasagabaster, David
    [J]. INTERNATIONAL JOURNAL OF BILINGUALISM, 2024, 28 (02) : 188 - 203