Deep learning based sentiment analysis and offensive language identification on multilingual code-mixed data

被引:0
|
作者
Kogilavani Shanmugavadivel
V. E. Sathishkumar
Sandhiya Raja
T. Bheema Lingaiah
S. Neelakandan
Malliga Subramanian
机构
[1] Kongu Engineering College,Department of Artificial Intelligence
[2] Hanyang University,Department of Industrial Engineering
[3] Kongu Engineering College,Department of Information Technology
[4] Jimma Institute of Technology,Departmemt of Biomedical Engineering
[5] R.M.K Engineering College,Department of Computer Science and Engineering
[6] Kongu Engineering College,Department of Computer Science and Engineering
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Sentiment analysis is a process in Natural Language Processing that involves detecting and classifying emotions in texts. The emotion is focused on a specific thing, an object, an incident, or an individual. Although some tasks are concerned with detecting the existence of emotion in text, others are concerned with finding the polarities of the text, which is classified as positive, negative, or neutral. The task of determining whether a comment contains inappropriate text that affects either individual or group is called offensive language identification. The existing research has concentrated more on sentiment analysis and offensive language identification in a monolingual data set than code-mixed data. Code-mixed data is framed by combining words and phrases from two or more distinct languages in a single text. It is quite challenging to identify emotion or offensive terms in the comments since noise exists in code-mixed data. The majority of advancements in hostile language detection and sentiment analysis are made on monolingual data for languages with high resource requirements. The proposed system attempts to perform both sentiment analysis and offensive language identification for low resource code-mixed data in Tamil and English using machine learning, deep learning and pre-trained models like BERT, RoBERTa and adapter-BERT. The dataset utilized for this research work is taken from a shared task on Multi task learning DravidianLangTech@ACL2022. Another challenge addressed by this work is the extraction of semantically meaningful information from code-mixed data using word embedding. The result represents an adapter-BERT model gives a better accuracy of 65% for sentiment analysis and 79% for offensive language identification when compared with other trained models.
引用
收藏
相关论文
共 50 条
  • [1] Deep learning based sentiment analysis and offensive language identification on multilingual code-mixed data
    Shanmugavadivel, Kogilavani
    Sathishkumar, V. E.
    Raja, Sandhiya
    Lingaiah, T. Bheema
    Neelakandan, S.
    Subramanian, Malliga
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [2] DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text
    Bharathi Raja Chakravarthi
    Ruba Priyadharshini
    Vigneshwaran Muralidaran
    Navya Jose
    Shardul Suryawanshi
    Elizabeth Sherly
    John P. McCrae
    [J]. Language Resources and Evaluation, 2022, 56 : 765 - 806
  • [3] DravidianCodeMix: sentiment analysis and offensive language identification dataset for Dravidian languages in code-mixed text
    Chakravarthi, Bharathi Raja
    Priyadharshini, Ruba
    Muralidaran, Vigneshwaran
    Jose, Navya
    Suryawanshi, Shardul
    Sherly, Elizabeth
    McCrae, John P.
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2022, 56 (03) : 765 - 806
  • [4] Transformer based multilingual joint learning framework for code-mixed and english sentiment analysis
    Mamta
    Ekbal, Asif
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (01) : 231 - 253
  • [5] Transformer based multilingual joint learning framework for code-mixed and english sentiment analysis
    Asif Mamta
    [J]. Journal of Intelligent Information Systems, 2024, 62 (1) : 231 - 253
  • [6] MULTILINGUAL CODE-MIXED SENTIMENT ANALYSIS IN HATE SPEECH
    Ranjan, Tulika
    Singh, Anish
    Kumari, Rina
    Swain, Sujata
    Bandyopadhyay, Anjan
    Parida, Ajaya kumar
    [J]. SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2023, 24 (04): : 873 - 882
  • [7] Sentiment Analysis and Offensive Language Identification in Code-Mixed Tamil-English Languages Using Transformer-Based Models
    Ponnambalam, Satheesh Kumar
    Desai, Darshana
    [J]. ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT III, 2024, 2092 : 149 - 167
  • [8] An analysis of machine learning models for sentiment analysis of Tamil code-mixed data
    Shanmugavadivel, Kogilavani
    Sampath, Sai Haritha
    Nandhakumar, Pramod
    Mahalingam, Prasath
    Subramanian, Malliga
    Kumaresan, Prasanna Kumar
    Priyadharshini, Ruba
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 76
  • [9] Meta-Learning for Offensive Language Detection in Code-Mixed Texts
    Suresh, Gautham Vadakkekara
    Chakravarthi, Bharathi Raja
    McCrae, John P.
    [J]. FIRE 2021: PROCEEDINGS OF THE 13TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION, 2021, : 58 - 66
  • [10] Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts
    Charangan Vasantharajan
    Uthayasanker Thayasivam
    [J]. SN Computer Science, 2022, 3 (1)