Tackling cyber-aggression: Identification and fine-grained categorization of aggressive texts on social media using weighted ensemble of transformers

被引:0
|
作者
Sharif, Omar [1 ]
Hoque, Mohammed Moshiul [1 ]
机构
[1] Department of CSE, Chittagong University of Engineering and Technology, Chittagong,4349, Bangladesh
关键词
Classification (of information);
D O I
暂无
中图分类号
学科分类号
摘要
The pervasiveness of aggressive content in social media has become a serious concern for government organizations and tech companies because of its pernicious societal effects. In recent years, social media has been repeatedly used as a tool to incite communal aggression, spread distorted propaganda, damage social harmony and demean the identity of individuals or a community in the public spaces. Therefore, restraining the proliferation of aggressive content and detecting them has become an urgent duty. Studies of the identification of aggressive content have mostly been done for English and other high-resource languages. Automatic systems developed for those languages can not accurately identify detrimental contents written in regional languages like Bengali. To compensate this insufficiency, this work presents a novel Bengali aggressive text dataset (called ‘BAD’) with two-level annotation. In level-A, 14158 texts are labeled as either aggressive or non-aggressive. While in level-B, 6807 aggressive texts are categorized into religious, political, verbal and gendered aggression classes each having 2217, 2085, 2043 and 462 texts respectively. This paper proposes a weighted ensemble technique including m-BERT, distil-BERT, Bangla-BERT and XLM-R as the base classifiers to identify and classify the aggressive texts in Bengali. The proposed model can readdress the softmax probabilities of the participating classifiers depending on their primary outcomes. This weighting technique has enabled the model to outdo the simple average ensemble and all other machine learning (ML), deep learning (DL) baselines. It has acquired the highest weighted f1-score of 93.43% in the identification task and 93.11% in the categorization task. Dataset developed as the part of this work is available at https://github.com/BAD-Bangla-Aggressive-Text-Dataset © 2021 Elsevier B.V.
引用
收藏
页码:462 / 481
相关论文
共 12 条
  • [1] Tackling cyber-aggression: Identification and fine-grained categorization of aggressive texts on social media using weighted ensemble of transformers
    Sharif, Omar
    Hoque, Mohammed Moshiul
    [J]. NEUROCOMPUTING, 2022, 490 : 462 - 481
  • [2] Bilingual Cyber-aggression detection on social media using LSTM autoencoder
    Kirti Kumari
    Jyoti Prakash Singh
    Yogesh Kumar Dwivedi
    Nripendra Pratap Rana
    [J]. Soft Computing, 2021, 25 : 8999 - 9012
  • [3] Bilingual Cyber-aggression detection on social media using LSTM autoencoder
    Kumari, Kirti
    Singh, Jyoti Prakash
    Dwivedi, Yogesh Kumar
    Rana, Nripendra Pratap
    [J]. SOFT COMPUTING, 2021, 25 (14) : 8999 - 9012
  • [4] Fine-Grained POS Tagging of German Social Media and Web Texts
    Thater, Stefan
    [J]. LANGUAGE TECHNOLOGIES FOR THE CHALLENGES OF THE DIGITAL AGE, GSCL 2017, 2018, 10713 : 72 - 80
  • [5] Detecting Depression in Social Media using Fine-Grained Emotions
    Ezra Aragon, Mario
    Pastor Lopez-Monroy, A.
    Gonzalez-Gurrola, Luis C.
    Montes-y-Gomez, Manuel
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 1481 - 1486
  • [6] Weakly-Supervised Fine-Grained Event Recognition on Social Media Texts for Disaster Management
    Yao, Wenlin
    Zhang, Cheng
    Saravanan, Shiva
    Huang, Ruihong
    Mostafavi, Ali
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 532 - 539
  • [7] TIDE: Affective Time-aware Representations for Fine-grained Depression Identification on Social Media
    Liu, Zhuanzhuan
    Ma, Xing
    Zhang, Peng
    Hao, Chuzhan
    Zhang, Shuo
    Wang, Lin
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [8] TWEETSPIN: Fine-grained Propaganda Detection in Social Media Using Multi-View Representations
    Vijayaraghavan, Prashanth
    Vosoughi, Soroush
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3433 - 3448
  • [9] Fine-Grained Subjective Partitioning of Urban Space Using Human Interactions From Social Media Data
    Qiao, Mengling
    Wang, Yandong
    Wu, Shanmei
    Luo, An
    Ruan, Shisi
    Gu, Yanyan
    [J]. IEEE ACCESS, 2019, 7 : 52085 - 52094
  • [10] Fine-grained assessment of greenspace satisfaction at regional scale using content analysis of social media and machine learning
    Wang, Zhifang
    Zhu, Zhongwei
    Xu, Min
    Qureshi, Salman
    [J]. SCIENCE OF THE TOTAL ENVIRONMENT, 2021, 776