Native Language Identification with User Generated Content

被引:0
|
作者
Goldin, Gili [1 ]
Rabinovich, Ella [2 ,3 ]
Wintner, Shuly [1 ]
机构
[1] Univ Haifa, Dept Comp Sci, Haifa, Israel
[2] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[3] Univ Haifa, Haifa, Israel
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address the task of native language identification in the context of social media content, where authors are highly-fluent, advanced nonnative speakers (of English). Using both linguistically-motivated features and the characteristics of the social media outlet, we obtain high accuracy on this challenging task. We provide a detailed analysis of the features that sheds light on differences between native and nonnative speakers, and among nonnative speakers with different backgrounds.
引用
收藏
页码:3591 / 3601
页数:11
相关论文
共 50 条
  • [1] #impressme: The Language of Motivation in User Generated Content
    Tomlinson, Marc T.
    Bracewell, David B.
    Krug, Wayne
    Hinote, David
    [J]. COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, CICLING 2014, PART II, 2014, 8404 : 176 - 187
  • [2] The Janes project: language resources and tools for Slovene user generated content
    Darja Fišer
    Nikola Ljubešić
    Tomaž Erjavec
    [J]. Language Resources and Evaluation, 2020, 54 : 223 - 246
  • [3] The Janes project: language resources and tools for Slovene user generated content
    Fiser, Darja
    Ljubesic, Nikola
    Erjavec, Tomaz
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2020, 54 (01) : 223 - 246
  • [4] User-generated content
    Wofford, Jennifer
    [J]. NEW MEDIA & SOCIETY, 2012, 14 (07) : 1236 - 1239
  • [5] User-generated content
    Greenfield, David
    [J]. CONTROL ENGINEERING, 2009, 56 (10) : 2 - 2
  • [6] Analysing Translators' Language Problems (and Solutions) Through User-generated Content
    Cibej, Jaka
    Gorjanc, Vojko
    Popic, Damjan
    [J]. PROCEEDINGS OF THE XVII EURALEX INTERNATIONAL CONGRESS: LEXICOGRAPHY AND LINGUISTIC DIVERSITY, 2016, : 158 - 167
  • [7] The Effect of Types of Language Mistakes on the Persuasiveness of User-Generated Content on Facebook
    Meir, Naama
    Tal-Or, Nurit
    [J]. PSYCHOLOGY OF POPULAR MEDIA, 2024,
  • [8] Document and Word-level Language Identification for Noisy User Generated Text
    Kozhirbayev, Zhanibek
    Yessenbayev, Zhandos
    Makazhanov, Aibek
    [J]. 2018 IEEE 12TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2018, : 124 - 127
  • [9] Exploiting native language interference for native language identification
    Markov, Ilia
    Nastase, Vivi
    Strapparava, Carlo
    [J]. NATURAL LANGUAGE ENGINEERING, 2022, 28 (02) : 167 - 197
  • [10] Journalistic Source Discovery: Supporting The Identification of News Sources in User Generated Content
    Wang, Yixue
    Diakopoulos, Nicholas
    [J]. CHI '21: PROCEEDINGS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2021,