Deep learning based sentiment analysis and offensive language identification on multilingual code-mixed data

被引：0

作者：

Kogilavani Shanmugavadivel

V. E. Sathishkumar

Sandhiya Raja

T. Bheema Lingaiah

S. Neelakandan

Malliga Subramanian

机构：

[1] Kongu Engineering College,Department of Artificial Intelligence

[2] Hanyang University,Department of Industrial Engineering

[3] Kongu Engineering College,Department of Information Technology

[4] Jimma Institute of Technology,Departmemt of Biomedical Engineering

[5] R.M.K Engineering College,Department of Computer Science and Engineering

[6] Kongu Engineering College,Department of Computer Science and Engineering

来源：

Scientific Reports | / 12卷

关键词：

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Sentiment analysis is a process in Natural Language Processing that involves detecting and classifying emotions in texts. The emotion is focused on a specific thing, an object, an incident, or an individual. Although some tasks are concerned with detecting the existence of emotion in text, others are concerned with finding the polarities of the text, which is classified as positive, negative, or neutral. The task of determining whether a comment contains inappropriate text that affects either individual or group is called offensive language identification. The existing research has concentrated more on sentiment analysis and offensive language identification in a monolingual data set than code-mixed data. Code-mixed data is framed by combining words and phrases from two or more distinct languages in a single text. It is quite challenging to identify emotion or offensive terms in the comments since noise exists in code-mixed data. The majority of advancements in hostile language detection and sentiment analysis are made on monolingual data for languages with high resource requirements. The proposed system attempts to perform both sentiment analysis and offensive language identification for low resource code-mixed data in Tamil and English using machine learning, deep learning and pre-trained models like BERT, RoBERTa and adapter-BERT. The dataset utilized for this research work is taken from a shared task on Multi task learning DravidianLangTech@ACL2022. Another challenge addressed by this work is the extraction of semantically meaningful information from code-mixed data using word embedding. The result represents an adapter-BERT model gives a better accuracy of 65% for sentiment analysis and 79% for offensive language identification when compared with other trained models.

引用

共 50 条

[31] Sentiment Analysis of Code-Mixed Telugu-English Data Leveraging Syllable and Word Embeddings
Rayala, Upendar Rao
Seshadri, Karthick
Sristy, Nagesh Bhattu
[J]. ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (10)
[32] Deep learning and multilingual sentiment analysis on social media data: An overview
Aguero-Torales, Marvin M.
Salas, Jose I. Abreu
Lopez-Herrera, Antonio G.
[J]. APPLIED SOFT COMPUTING, 2021, 107 (107)
[33] Language Identification of Bengali-English Code-Mixed Data using Character & Phonetic based LSTM Models
Das, Sourya Dipta
Mandal, Soumil
Das, Dipankar
[J]. PROCEEDINGS OF THE 11TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2019), 2019, : 60 - 64
[34] Overview of the track on Sentiment Analysis for Dravidian Languages in Code-Mixed Text
Chakravarthi, Bharathi Raja
Priyadharshini, Ruba
Muralidaran, Vigneshwaran
Suryawanshi, Shardul
Jose, Navya
Sherly, Elizabeth
[J]. PROCEEDINGS OF THE 12TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2020), 2020, : 21 - 24
[35] Classification of Code-Mixed Bilingual Phonetic Text Using Sentiment Analysis
Singh, Shailendra Kumar
Sachan, Manoj Kumar
[J]. INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2021, 17 (02) : 59 - 78
[36] Language Detection in Sinhala-English Code-mixed Data
Smith, Ian
Thayasivam, Uthayasanker
[J]. PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2019, : 228 - 233
[37] Deep Learning Technique for Sentiment Analysis of Hindi-English Code-Mixed Text using Late Fusion of Character and Word Features
Mukherjee, Siddhartha
[J]. 2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
[38] Deep Learning-based Hate Speech Detection in Code-mixed Tamil Text
Anbukkarasi, S.
Varadhaganapathy, S.
[J]. IETE JOURNAL OF RESEARCH, 2023, 69 (11) : 7893 - 7898
[39] Sentiment analysis of code-mixed Dravidian languages leveraging pretrained model and word-level language tag
Chanda, Supriya
Mishra, Anshika
Pal, Sukomal
[J]. NATURAL LANGUAGE PROCESSING, 2024,
[40] Sentiment Analysis for Code-Mixed Indian Social Media Text With Distributed Representation
Shalini, K.
Ganesh, Barathi H. B.
Kumar, Anand M.
Soman, K. P.
[J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 1126 - 1131

← 1 2 3 4 5 →