Use of Data Augmentation Techniques in Detection of Antisocial Behavior Using Deep Learning Methods

被引:7
|
作者
Maslej-Kresnakova, Viera [1 ]
Sarnovsky, Martin [1 ]
Jackova, Julia [1 ]
机构
[1] Tech Univ Kosice, Fac Elect Engn & Informat, Dept Cybernet & Artificial Intelligence, Kosice 04001, Slovakia
来源
FUTURE INTERNET | 2022年 / 14卷 / 09期
关键词
data augmentation; FDA; deep learning; antisocial behavior; fake news detection; toxic comments;
D O I
10.3390/fi14090260
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The work presented in this paper focuses on the use of data augmentation techniques applied in the domain of the detection of antisocial behavior. Data augmentation is a frequently used approach to overcome issues related to the lack of data or problems related to imbalanced classes. Such techniques are used to generate artificial data samples used to improve the volume of the training set or to balance the target distribution. In the antisocial behavior detection domain, we frequently face both issues, the lack of quality labeled data as well as class imbalance. As the majority of the data in this domain is textual, we must consider augmentation methods suitable for NLP tasks. Easy data augmentation (EDA) represents a group of such methods utilizing simple text transformations to create the new, artificial samples. Our main motivation is to explore EDA techniques' usability on the selected tasks from the antisocial behavior detection domain. We focus on the class imbalance problem and apply EDA techniques to two problems: fake news and toxic comments classification. In both cases, we train the convolutional neural networks classifier and compare its performance on the original and EDA-extended datasets. EDA techniques prove to be very task-dependent, with certain limitations resulting from the data they are applied on. The model's performance on the extended toxic comments dataset did improve only marginally, gaining only 0.01 improvement in the F1 metric when applying only a subset of EDA methods. EDA techniques in this case were not suitable enough to handle texts written in more informal language. On the other hand, on the fake news dataset, the performance was improved more significantly, boosting the F1 score by 0.1. Improvement was most significant in the prediction of the minor class, where F1 improved from 0.67 to 0.86.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Detection of COVID-19 using deep learning techniques and classification methods
    Oguz, Cinare
    Yaganoglu, Mete
    INFORMATION PROCESSING & MANAGEMENT, 2022, 59 (05)
  • [32] Detecting Pneumonia with a Deep Learning Model and Random Data Augmentation Techniques
    Guesmi, Tawfik
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (05) : 1187 - 1196
  • [33] A review of medical image data augmentation techniques for deep learning applications
    Chlap, Phillip
    Min, Hang
    Vandenberg, Nym
    Dowling, Jason
    Holloway, Lois
    Haworth, Annette
    JOURNAL OF MEDICAL IMAGING AND RADIATION ONCOLOGY, 2021, 65 (05) : 545 - 563
  • [34] Arrhythmia detection using resampling and deep learning methods on unbalanced data
    Shchetinin, E. Y.
    Glushkova, A. G.
    COMPUTER OPTICS, 2022, 46 (06) : 980 - 987
  • [35] Forensic detection of heterogeneous activity in data using deep learning methods
    Nyarko, Benedicta Nana Esi
    Bin, Wu
    Zhou, Jinzhi
    Odoom, Justice
    Danso, Samuel Akwasi
    Addai, Gyarteng Emmanuel Sarpong
    INTELLIGENT SYSTEMS WITH APPLICATIONS, 2024, 21
  • [36] Salinity Modeling Using Deep Learning with Data Augmentation and Transfer Learning
    Qi, Siyu
    He, Minxue
    Hoang, Raymond
    Zhou, Yu
    Namadi, Peyman
    Tom, Bradley
    Sandhu, Prabhjot
    Bai, Zhaojun
    Chung, Francis
    Ding, Zhi
    Anderson, Jamie
    Roh, Dong Min
    Huynh, Vincent
    WATER, 2023, 15 (13)
  • [37] Data Augmentation and Deep Learning Methods in Sound Classification: A Systematic Review
    Abayomi-Alli, Olusola O.
    Damasevicius, Robertas
    Qazi, Atika
    Adedoyin-Olowe, Mariam
    Misra, Sanjay
    ELECTRONICS, 2022, 11 (22)
  • [38] Data-Augmentation for Deep Learning Based Remote Photoplethysmography Methods
    Perche, Simon
    Botina, Deivid
    Benezeth, Yannick
    Nakamura, Keisuke
    Gomez, Randy
    Miteran, Johel
    2021 INTERNATIONAL CONFERENCE ON E-HEALTH AND BIOENGINEERING (EHB 2021), 9TH EDITION, 2021,
  • [39] Efficient deep learning based data augmentation techniques for enhanced learning on inadequate medical imaging data
    Sashank, Madipally Sai Krishna
    Maddila, Vijay Souri
    Boddu, Vikas
    Radhika, Y.
    ACTA IMEKO, 2022, 11 (01):
  • [40] Evaluating Deep Music Generation Methods Using Data Augmentation
    Godwin, Toby
    Rizos, Georgios
    Baird, Alice
    Al Futaisi, Najla D.
    Brisse, Vincent
    Schuller, Bjorn W.
    IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,