Exploiting native language interference for native language identification

被引:1
|
作者
Markov, Ilia [1 ]
Nastase, Vivi [2 ]
Strapparava, Carlo [3 ]
机构
[1] Univ Antwerp, CLiPS, Antwerp, Belgium
[2] Univ Stuttgart, Stuttgart, Germany
[3] Fdn Bruno Kessler, FBK Irst, Trento, Italy
关键词
Native language interference; native language identification; punctuation; emotions; cognates;
D O I
10.1017/S1351324920000595
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Native language identification (NLI)-the task of automatically identifying the native language (L1) of persons based on their writings in the second language (L2)-is based on the hypothesis that characteristics of L1 will surface and interfere in the production of texts in L2 to the extent that L1 is identifiable. We present an in-depth investigation of features that model a variety of linguistic phenomena potentially involved in native language interference in the context of the NLI task: the languages' structuring of information through punctuation usage, emotion expression in language, and similarities of form with the L1 vocabulary through the use of anglicized words, cognates, and other misspellings. The results of experiments with different combinations of features in a variety of settings allow us to quantify the native language interference value of these linguistic phenomena and show how robust they are in cross-corpus experiments and with respect to proficiency in L2. These experiments provide a deeper insight into the NLI task, showing how native language interference explains the gap between baseline, corpus-independent features, and the state of the art that relies on features/representations that cover (indiscriminately) a variety of linguistic phenomena.
引用
收藏
页码:167 / 197
页数:31
相关论文
共 50 条
  • [1] Portuguese Native Language Identification
    Malmasi, Shervin
    del Rio, Iria
    Zampier, Marcos
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 115 - 124
  • [2] Multilingual native language identification
    Malmasi, Shervin
    Dras, Mark
    [J]. NATURAL LANGUAGE ENGINEERING, 2017, 23 (02) : 163 - 215
  • [3] Addressing Cultural and Native Language Interference in Second Language Acquisition
    Allard, Daniele
    Bourdeau, Jacqueline
    Mizoguchi, Riichiro
    [J]. CALICO JOURNAL, 2011, 28 (03): : 677 - 698
  • [4] Bridging the Native Language and Language Variety Identification Tasks
    Franco-Salvador, Marc
    Kondrak, Greg
    Rosso, Paolo
    [J]. KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS, 2017, 112 : 1554 - 1561
  • [5] Feature Analysis for Native Language Identification
    Nisioi, Sergiu
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING (CICLING 2015), PT I, 2015, 9041 : 644 - 657
  • [6] Encouragement of the Adolescent Having Language Interference during the Lessons of the Native Language
    Usca, Svetlana
    Vigante, Rasma
    [J]. SOCIETY, INTEGRATION, EDUCATION, 2010, : 374 - 381
  • [7] The native language
    Serrano, Julio
    [J]. CUADERNOS HISPANOAMERICANOS, 2015, (775) : 111 - 114
  • [8] 'Native language'
    Harmon, A. G.
    [J]. ANTIOCH REVIEW, 2008, 66 (03): : 489 - 502
  • [9] 'Native Language'
    Wallace, H
    [J]. MIDWEST QUARTERLY-A JOURNAL OF CONTEMPORARY THOUGHT, 2004, 45 (04): : 398 - 398
  • [10] 'NATIVE LANGUAGE'
    BRUCE, D
    [J]. LITERARY REVIEW, 1982, 25 (03) : 369 - 369