Code-Mixing in Social Media Text The Last Language Identification Frontier?

被引:0
|
作者
Das, Amitava [1 ]
Gamback, Bjoern [2 ]
机构
[1] NIIT Univ, Neemrana 301705, Rajasthan, India
[2] Norwegian Univ Sci & Technol, N-7491 Trondheim, Norway
来源
TRAITEMENT AUTOMATIQUE DES LANGUES | 2013年 / 54卷 / 03期
关键词
Code-mixing; code-switching; social media text; language identification;
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Automatic understanding of noisy social media text is one of the prime presentday research areas. Most research has so far concentrated on English texts; however, more than half of the users are writing in other languages, making language identification a prerequisite for comprehensive processing of social media text. Though language identification has been considered an almost solved problem in other applications, language detectors fail in the social media context due to phenomena such as code-mixing, code-switching, lexical borrowings, Anglicisms, and phonetic typing. This paper reports an initial study to understand the characteristics of code-mixing in the social media context and presents a system developed to automatically detect language boundaries in code-mixed social media text, here exemplified by Facebook messages in mixed English-Bengali and English-Hindi.
引用
收藏
页码:41 / 64
页数:24
相关论文
共 50 条
  • [1] Word Level Language Identification of Code Mixing Text in Social Media using NLP
    Shanmugalingam, Kasthuri
    Sumathipala, Sagara
    Premachandra, Chinthaka
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY RESEARCH (ICITR), 2018,
  • [2] Bilingual Code-Mixing in Indian Social Media Texts for Hindi and English
    Kumar, Rajesh
    Singh, Pardeep
    [J]. ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2017, 2017, 712 : 121 - 129
  • [3] Language Identification and Analysis of Code-Switched Social Media Text
    Mave, Deepthi
    Maharjan, Suraj
    Solorio, Thamar
    [J]. COMPUTATIONAL APPROACHES TO LINGUISTIC CODE-SWITCHING, 2018, : 51 - 61
  • [4] Social, economic, and demographic factors drive the emergence of Hinglish code-mixing on social media
    Sengupta, Ayan
    Das, Soham
    Akhtar, Md. Shad
    Chakraborty, Tanmoy
    [J]. HUMANITIES & SOCIAL SCIENCES COMMUNICATIONS, 2024, 11 (01):
  • [5] The effect of code-mixing on accent identification accuracy
    Niesler, Thomas
    de Wet, Febe
    [J]. COMPUTER SPEECH AND LANGUAGE, 2009, 23 (04): : 435 - 443
  • [6] CODE-MIXING TO ENGLISH LANGUAGE AS A MEANS OF COMMUNICATION IN JORDANIAN ARABIC
    Vanyushina, Natalia
    Hazaymeh, Omar
    [J]. DIALECTOLOGIA, 2021, (27): : 229 - 239
  • [7] Code-switching and code-mixing in bilingual communication: Language deficiency or creativity?
    Nugraheni, D. A.
    [J]. ELT IN ASIA IN THE DIGITAL ERA: GLOBAL CITIZENSHIP AND IDENTITY, 2018, : 401 - 407
  • [8] Bislama into Kwamera: Code-mixing and Language Change on Tanna (Vanuatu)
    Lindstrom, Lamont
    [J]. LANGUAGE DOCUMENTATION & CONSERVATION, 2007, 1 (02): : 216 - 239
  • [9] A Language Identification System for Code-Mixed English-Manipuri Social Media Text
    Lamabam, Priyadarshini
    Chakma, Kunal
    [J]. PROCEEDINGS OF 2ND IEEE INTERNATIONAL CONFERENCE ON ENGINEERING & TECHNOLOGY ICETECH-2016, 2016, : 79 - 83
  • [10] Experimenting Language Identification for Sentiment Analysis of English Punjabi Code Mixed Social Media Text
    Bansal, Neetika
    Goyal, Vishal
    Rani, Simpel
    [J]. INTERNATIONAL JOURNAL OF E-ADOPTION, 2020, 12 (01) : 52 - 62