A self-attention hybrid emoji prediction model for code-mixed language: (Hinglish)

被引:0
|
作者
Gadde Satya Sai Naga Himabindu
Rajat Rao
Divyashikha Sethia
机构
[1] Delhi Technological University,Department of Computer Engineering
来源
关键词
Emoji prediction; Hinglish; Code mixed; Deep learning; Hybrid model;
D O I
暂无
中图分类号
学科分类号
摘要
Emojis are an essential tool for communication, and various resource-rich languages such as English use emoji prediction systems. However, there is limited research on emoji prediction for resource-poor and code-mixed languages such as Hinglish (Hindi + English), the fourth most used code-mixed language globally. This paper proposes a novel Hinglish Emoji Prediction (HEP) dataset created using Twitter as a corpus and a hybrid emoji prediction model BiLSTM attention random forest (BARF) for code-mixed Hinglish language. The proposed BARF model combines deep learning features with machine learning classification. It begins with BiLSTM to capture the context and then proceeds to self-attention to extract significant texts. Finally, it uses random forest to categorize the features to predict an emoji. The self-attention mechanism aids learning since Hinglish, a code-mixed language, lacks proper grammatical rules. The combination of deep learning and machine learning algorithms and attention is novel to emoji prediction in the code-mixed language(Hinglish). Results on the HEP dataset indicate that the BARF model outperformed previous multilingual and baseline emoji prediction models. It achieved an accuracy of 61.14%, precision of 0.66, recall of 0.59, and F1 score of 0.59.
引用
收藏
相关论文
共 50 条
  • [1] A self-attention hybrid emoji prediction model for code-mixed language: (Hinglish)
    Himabindu, Gadde Satya Sai Naga
    Rao, Rajat
    Sethia, Divyashikha
    SOCIAL NETWORK ANALYSIS AND MINING, 2022, 12 (01)
  • [2] HiPHET: HYBRID APPROACH FOR TRANSLATING CODE-MIXED LANGUAGE (HINGLISH) TO PURE LANGUAGES (HINDI AND ENGLISH)
    Attri, Shree Harsh
    Prasad, T., V
    Ramakrishna, G.
    COMPUTER SCIENCE-AGH, 2020, 21 (03):
  • [3] MUTANT: A Multi-sentential Code-mixed Hinglish Dataset
    Gupta, Rahul
    Srivastava, Vivek
    Singh, Mayank
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 744 - 753
  • [4] Multitasking of sentiment detection and emotion recognition in code-mixed Hinglish data
    Ghosh, Soumitra
    Priyankar, Amit
    Ekbal, Asif
    Bhattacharyya, Pushpak
    KNOWLEDGE-BASED SYSTEMS, 2023, 260
  • [5] Word Embeddings for Code-Mixed Language Processing
    Pratapa, Adithya
    Choudhury, Monojit
    Sitaram, Sunayana
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3067 - 3072
  • [6] HIT: A Hierarchically Fused Deep Attention Network for Robust Code-mixed Language Representation
    Sengupta, Ayan
    Bhattacharjee, Sourabh Kumar
    Chakraborty, Tanmoy
    Akhtar, Md Shad
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4625 - 4639
  • [7] Does aggression lead to hate? Detecting and reasoning offensive traits in hinglish code-mixed texts
    Sengupta, Ayan
    Bhattacharjee, Sourabh Kumar
    Akhtar, Md Shad
    Chakraborty, Tanmoy
    NEUROCOMPUTING, 2022, 488 : 598 - 617
  • [8] Protein–protein interaction site prediction by model ensembling with hybrid feature and self-attention
    Hanhan Cong
    Hong Liu
    Yi Cao
    Cheng Liang
    Yuehui Chen
    BMC Bioinformatics, 24
  • [9] CMHE-AN: Code mixed hybrid embedding based attention network for aggression identification in hindi english code-mixed text
    Shikha Mundra
    Namita Mittal
    Multimedia Tools and Applications, 2023, 82 : 11337 - 11364
  • [10] CMHE-AN: Code mixed hybrid embedding based attention network for aggression identification in hindi english code-mixed text
    Mundra, Shikha
    Mittal, Namita
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (08) : 11337 - 11364