Deep learning based sentiment analysis and offensive language identification on multilingual code-mixed data

被引:0
|
作者
Kogilavani Shanmugavadivel
V. E. Sathishkumar
Sandhiya Raja
T. Bheema Lingaiah
S. Neelakandan
Malliga Subramanian
机构
[1] Kongu Engineering College,Department of Artificial Intelligence
[2] Hanyang University,Department of Industrial Engineering
[3] Kongu Engineering College,Department of Information Technology
[4] Jimma Institute of Technology,Departmemt of Biomedical Engineering
[5] R.M.K Engineering College,Department of Computer Science and Engineering
[6] Kongu Engineering College,Department of Computer Science and Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Sentiment analysis is a process in Natural Language Processing that involves detecting and classifying emotions in texts. The emotion is focused on a specific thing, an object, an incident, or an individual. Although some tasks are concerned with detecting the existence of emotion in text, others are concerned with finding the polarities of the text, which is classified as positive, negative, or neutral. The task of determining whether a comment contains inappropriate text that affects either individual or group is called offensive language identification. The existing research has concentrated more on sentiment analysis and offensive language identification in a monolingual data set than code-mixed data. Code-mixed data is framed by combining words and phrases from two or more distinct languages in a single text. It is quite challenging to identify emotion or offensive terms in the comments since noise exists in code-mixed data. The majority of advancements in hostile language detection and sentiment analysis are made on monolingual data for languages with high resource requirements. The proposed system attempts to perform both sentiment analysis and offensive language identification for low resource code-mixed data in Tamil and English using machine learning, deep learning and pre-trained models like BERT, RoBERTa and adapter-BERT. The dataset utilized for this research work is taken from a shared task on Multi task learning DravidianLangTech@ACL2022. Another challenge addressed by this work is the extraction of semantically meaningful information from code-mixed data using word embedding. The result represents an adapter-BERT model gives a better accuracy of 65% for sentiment analysis and 79% for offensive language identification when compared with other trained models.
引用
收藏
相关论文
共 50 条
  • [31] Sentiment Analysis of Code-Mixed Telugu-English Data Leveraging Syllable and Word Embeddings
    Rayala, Upendar Rao
    Seshadri, Karthick
    Sristy, Nagesh Bhattu
    [J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
  • [32] Deep learning and multilingual sentiment analysis on social media data: An overview
    Aguero-Torales, Marvin M.
    Salas, Jose I. Abreu
    Lopez-Herrera, Antonio G.
    [J]. APPLIED SOFT COMPUTING, 2021, 107 (107)
  • [33] Language Identification of Bengali-English Code-Mixed Data using Character & Phonetic based LSTM Models
    Das, Sourya Dipta
    Mandal, Soumil
    Das, Dipankar
    [J]. PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 60 - 64
  • [34] Overview of the track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text
    Chakravarthi, Bharathi Raja
    Priyadharshini, Ruba
    Muralidaran, Vigneshwaran
    Suryawanshi, Shardul
    Jose, Navya
    Sherly, Elizabeth
    [J]. PROCEEDINGS OF THE 12TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2020), 2020, : 21 - 24
  • [35] Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
    Singh, Shailendra Kumar
    Sachan, Manoj Kumar
    [J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2021, 17 (02) : 59 - 78
  • [36] Language Detection in Sinhala-English Code-mixed Data
    Smith, Ian
    Thayasivam, Uthayasanker
    [J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 228 - 233
  • [37] Deep Learning Technique for Sentiment Analysis of Hindi-English Code-Mixed Text using Late Fusion of Character and Word Features
    Mukherjee, Siddhartha
    [J]. 2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
  • [38] Deep Learning-based Hate Speech Detection in Code-mixed Tamil Text
    Anbukkarasi, S.
    Varadhaganapathy, S.
    [J]. IETE JOURNAL OF RESEARCH, 2023, 69 (11) : 7893 - 7898
  • [39] Sentiment analysis of code-mixed Dravidian languages leveraging pretrained model and word-level language tag
    Chanda, Supriya
    Mishra, Anshika
    Pal, Sukomal
    [J]. NATURAL LANGUAGE PROCESSING, 2024,
  • [40] Sentiment Analysis for Code-Mixed Indian Social Media Text With Distributed Representation
    Shalini, K.
    Ganesh, Barathi H. B.
    Kumar, Anand M.
    Soman, K. P.
    [J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1126 - 1131