Deep Learning Based Sentiment Analysis in a Code-Mixed English-Hindi and English-Bengali Social Media Corpus

被引:14
|
作者
Jamatia, Anupam [1 ]
Swamy, Steve Durairaj [1 ]
Gamback, Bjorn [2 ,4 ]
Das, Amitava [3 ]
Debbarma, Swapan [1 ]
机构
[1] Natl Inst Technol, Comp Sci & Engn Dept, Agartala 799046, Tripura, India
[2] Norwegian Univ Sci & Technol, Dept Comp Sci, N-7491 Trondheim, Norway
[3] Wipro AI Labs, Bengaluru 560100, Karnataka, India
[4] Res Inst Sweden AB, RISE, Digital Syst Div, S-16428 Kista, Sweden
关键词
Code-switching; recurrent neural networks; convolutional neural networks;
D O I
10.1142/S0218213020500141
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentiment analysis is a circumstantial analysis of text, identifying the social sentiment to better understand the source material. The article addresses sentiment analysis of an English-Hindi and English-Bengali code-mixed textual corpus collected from social media. Code-mixing is an amalgamation of multiple languages, which previously mainly was associated with spoken language. However, social media users also deploy it to communicate in ways that tend to be somewhat casual. The coarse nature of social media text poses challenges for many language processing applications. Here, the focus is on the low predictive nature of traditional machine learners when compared to Deep Learning counterparts, including the contextual language representation model BERT (Bidirectional Encoder Representations from Transformers), on the task of extracting user sentiment from code-mixed texts. Three deep learners (a BiLSTM CNN, a Double BiLSTM and an Attention-based model) attained accuracy 20-60% greater than traditional approaches on code-mixed data, and were for comparison also tested on monolingual English data.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Humor Detection in English-Hindi Code-Mixed Social Media Content : Corpus and Baseline System
    Khandelwal, Ankush
    Swami, Sahil
    Akthar, Syed S.
    Shrivastava, Manish
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1203 - 1207
  • [2] Gender Prediction in English-Hindi Code-Mixed Social Media Content: Corpus and Baseline System
    Khandelwal, Ankush
    Swami, Sahil
    Akhtar, Syed Sarfaraz
    Shrivastava, Manish
    [J]. COMPUTACION Y SISTEMAS, 2018, 22 (04): : 1241 - 1247
  • [3] Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora
    Jamatia, Anupam
    Das, Amitava
    Gambaeck, Bjoern
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2019, 28 (03) : 399 - 408
  • [4] Word Level Language Identification in Assamese-Bengali-Hindi-English Code-Mixed Social Media Text
    Sarma, Neelakshi
    Singh, Sanasam Ranbir
    Goswami, Diganta
    [J]. 2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 261 - 266
  • [5] Bilingual Sentiment Analysis for a Code-mixed Punjabi English Social Media Text
    Yadav, Konark
    Lamba, Aashish
    Gupta, Dhruv
    Gupta, Ansh
    Karmakar, Purnendu
    Saini, Sandeep
    [J]. PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [6] Sentiment Analysis of Code-Mixed Roman Urdu-English Social Media Text using Deep Learning Approaches
    Younas, Aqsa
    Nasim, Raheela
    Ali, Saqib
    Wang, Guojun
    Qi, Fang
    [J]. 2020 IEEE 23RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE 2020), 2020, : 66 - 71
  • [7] Social media text analytics of Malayalam–English code-mixed using deep learning
    S. Thara
    Prabaharan Poornachandran
    [J]. Journal of Big Data, 9
  • [8] Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
    Santosh, T. Y. S. S.
    Aravind, K. V. S.
    [J]. PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, : 310 - 313
  • [9] Named Entity Recognition for Hindi-English Code-Mixed Social Media Text
    Singh, Vinay
    Shrivastava, Manish
    Akhtar, Syed Sarfaraz
    Vijay, Deepanshu
    [J]. NAMED ENTITIES, 2018, : 27 - 35
  • [10] Aggression-annotated Corpus of Hindi-English Code-mixed Data
    Kumar, Ritesh
    Reganti, Aishwarya N.
    Bhatia, Akshit
    Maheshwari, Tushar
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1425 - 1431