Arabic Authorship Attribution Using Synthetic Minority Over-Sampling Technique and Principal Components Analysis for Imbalanced Documents

被引:4
|
作者
Hadjadj, Hassina [1 ]
Sayoud, Halim [1 ]
机构
[1] USTHB Univ, Bab Ezzouar, Algeria
关键词
Arabic Language; Authorship Attribution; BayesNet; Imbalanced Datasets; Principal Component Analysis (PCA); SMO-SVM; Synthetic Minority Over-Sampling Technique (SMOTE); SMOTE;
D O I
10.4018/IJCINI.20211001.oa33
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dealing with imbalanced data represents a great challenge in data mining as well as in machine learning task. In this investigation, the authors are interested in the problem of class imbalance in authorship attribution (AA) task, with specific application on Arabic text data. This article proposes a new hybrid approach based on principal components analysis (PCA) and synthetic minority over-sampling technique (SMOTE), which considerably improve the performances of authorship attribution on imbalanced data. The used dataset contains seven Arabic books written by seven different scholars, which are segmented into text segments of the same size, with an average length of 2,900 words per text. The obtained results of the experiments show that the proposed approach using the SMO-SVM classifier presents high performance in terms of authorship attribution accuracy (100%), especially with starting character-bigrams. In addition, the proposed method appears quite interesting by improving the AA performances in imbalanced datasets, mainly with function words.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Imbalanced data classification using improved synthetic minority over-sampling technique
    Anusha, Yamijala
    Visalakshi, R.
    Srinivas, Konda
    [J]. MULTIAGENT AND GRID SYSTEMS, 2023, 19 (02) : 117 - 131
  • [2] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    [J]. PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [3] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16):
  • [4] Software defect prediction with imbalanced distribution by radius-synthetic minority over-sampling technique
    Guo, Shikai
    Dong, Jian
    Li, Hui
    Wang, Jiahui
    [J]. JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2021, 33 (07)
  • [5] AN IMBALANCED SIGNAL MODULATION CLASSIFICATION AND EVALUATION METHOD BASED ON SYNTHETIC MINORITY OVER-SAMPLING TECHNIQUE
    Liu, Xuebo
    Wang, Yiran
    Bai, Jing
    Li, Haoran
    Wang, Xu
    [J]. IGARSS 2023 - 2023 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2023, : 6224 - 6227
  • [6] Enhancing Cascade Quality Prediction Method in Handling Imbalanced Dataset Using Synthetic Minority Over-Sampling Technique
    Julian, Fajar Azhari
    Arif, Fahmi
    [J]. INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2023, 22 (04): : 389 - 398
  • [7] Classification of imbalanced PubChem BioAssay data using an efficient algorithm coupled with synthetic minority over-sampling technique
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2014, 247
  • [8] Predicting Nurse Turnover for Highly Imbalanced Data Using the Synthetic Minority Over-Sampling Technique and Machine Learning Algorithms
    Xu, Yuan
    Park, Yongshin
    Park, Ju Dong
    Sun, Bora
    [J]. HEALTHCARE, 2023, 11 (24)
  • [9] An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    [J]. ANALYTICA CHIMICA ACTA, 2014, 806 : 117 - 127
  • [10] Synthetic Minority Over-Sampling Technique based on Fuzzy C-means Clustering for Imbalanced Data
    Lee, Hansoo
    Jung, Seunghyan
    Kim, Minseok
    Kimt, Sungshin
    [J]. 2017 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2017,