An effective cybernated word embedding system for analysis and language identification in code-mixed social media text

被引:4
|
作者
Shekhar, Shashi [1 ]
Sharma, Dilip Kumar [1 ]
Beg, M. M. Sufyan [2 ]
机构
[1] GLA Univ, Dept Comp Engn & Applicat, Mathura 281406, India
[2] Aligarh Muslim Univ, Dept Comp Engn, Aligarh 202002, Uttar Pradesh, India
关键词
Language identification; transliteration; character embedding; word embedding; Natural Language Processing; cBoW; skip-gram;
D O I
10.3233/KES-190409
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The language used by the users in social media nowadays is Code-mixed text, i.e., mixing of two or more languages. This paper describes the application of the code mixed index in Indian social media texts and comparing the complexity to identify language at word level using Bi-directional Long Short Term Memory model. Social media platforms are now widely used by people to express their opinion and interest. The major contribution of the work is to propose a technique for identifying the language of Hindi-English code-mixed data used in three social media platforms namely, Facebook, Twitter, and WhatsApp. We recommend a deep learning framework based on cBoW and Skip gram model that predicts the origin of the word from language perspective in the sequence based on the specific words that have come before it in the sequence. The context capture module of the system gives better accuracy for word embedding model as compared to character embedding.
引用
收藏
页码:167 / 179
页数:13
相关论文
共 50 条
  • [31] Sentiment Analysis of Code-Mixed Social Media Text (SA-CMSMT) in Indian-Languages
    Ahmad, Gazi Imtiyaz
    Singla, Jimmy
    2021 INTERNATIONAL CONFERENCE ON COMPUTING SCIENCES (ICCS 2021), 2021, : 25 - 33
  • [32] Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
    Santosh, T. Y. S. S.
    Aravind, K. V. S.
    PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, : 310 - 313
  • [33] Social media text analytics of Malayalam–English code-mixed using deep learning
    S. Thara
    Prabaharan Poornachandran
    Journal of Big Data, 9
  • [34] Named Entity Recognition for Hindi-English Code-Mixed Social Media Text
    Singh, Vinay
    Shrivastava, Manish
    Akhtar, Syed Sarfaraz
    Vijay, Deepanshu
    NAMED ENTITIES, 2018, : 27 - 35
  • [35] A Comparative study on Code-Mixed data of Indian Social Media vs Formal text
    Ranjan, Prakash
    Raja, Bharathi
    Priyadharshini, Ruba
    Balabantaray, Rakesh Chandra
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2016, : 608 - 611
  • [36] MHE: Code-Mixed Corpora for Similar Language Identification
    Rani, Priya
    McCrae, John P.
    Fransen, Theodorus
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3425 - 3433
  • [37] Sentiment Analysis of Code-Mixed Bambara-French Social Media Text Using Deep Learning Techniques
    Arouna KONATE
    DU Ruiying
    Wuhan University Journal of Natural Sciences, 2018, 23 (03) : 237 - 243
  • [38] Impact of Emojis in Emotion Analysis on Code-Mixed Text
    Tang, Tianai
    Nongpong, Kwankamol
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2023, 2023, : 25 - 30
  • [39] Sentiment Analysis of Code-Mixed Text: A Comprehensive Review
    Perera, Anne
    Caldera, Amitha
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2024, 30 (02) : 242 - 261
  • [40] Social media text analytics of Malayalam-English code-mixed using deep learning
    Thara, S.
    Poornachandran, Prabaharan
    JOURNAL OF BIG DATA, 2022, 9 (01)