SMOTETomek-Based Resampling for Personality Recognition

被引:48
|
作者
Wang, Zhe [1 ]
Wu, Chunhua [1 ]
Zheng, Kangfeng [1 ]
Niu, Xinxin [1 ]
Wang, Xiujuan [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100876, Peoples R China
[2] Beijing Univ Technol, Sch Comp Sci, Beijing 100124, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Personality recognition; PSO-SMOTETomek; sample distribution imbalance; FACEBOOK;
D O I
10.1109/ACCESS.2019.2940061
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The main challenge of user personality recognition is low accuracy resulting from small sample size and severe sample distribution imbalance. This paper analyzes the impact of imbalanced data distribution and positive and negative sample overlap on the machine learning classification model. The classification model is based on the data resampling technique, which can improve the classification accuracy. These problems can be solved once the data are effectively resampled. We present a personality prediction method based on particle swarm optimization (PSO) and synthetic minority oversampling technique+Tomek Link (SMOTETomek) resampling (PSO-SMOTETomek), which, apart from effective SMOTETomek resampling of data samples, is able to execute PSO feature optimization for each set of feature combinations. Validated by simulation, our analysis reveals that the PSO-SMOTETomek method is efficient under a small dataset, and the accuracy of personality recognition is improved by up to around 10%. The results are better than those of previous similar studies. The average accuracies of the plain text dataset and the non-plain text dataset are 75.34% and 78.78%, respectively. The average accuracies of the short text dataset and the long text dataset are 75.34% and 64.25%, respectively. From the experimental results, we found that short text has a better classification effect than long text. Plain text data can still have high personality discrimination accuracy, but there is no relevant external information. The proposed model is able to facilitate the design and implementation of a personality recognition system, and the model significantly outperforms existing state-of-the-art models.
引用
收藏
页码:129678 / 129689
页数:12
相关论文
共 50 条
  • [31] Sensor Data Augmentation by Resampling in Contrastive Learning for Human Activity Recognition
    Wang, Jinqiang
    Zhu, Tao
    Gan, Jingyuan
    Chen, Liming Luke
    Ning, Huansheng
    Wan, Yaping
    [J]. IEEE SENSORS JOURNAL, 2022, 22 (23) : 22994 - 23008
  • [32] Pruned resampling: Probabilistic model selection schemes for sequential face recognition
    Matsui, Atsushi
    Clippingdale, Simon
    Matsumoto, Takashi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2007, E90D (08): : 1151 - 1159
  • [33] Human actions recognition from motion capture recordings using signal resampling and pattern recognition methods
    Tomasz Hachaj
    Marek R. Ogiela
    Katarzyna Koptyra
    [J]. Annals of Operations Research, 2018, 265 : 223 - 239
  • [34] Human actions recognition from motion capture recordings using signal resampling and pattern recognition methods
    Hachaj, Tomasz
    Ogiela, Marek R.
    Koptyra, Katarzyna
    [J]. ANNALS OF OPERATIONS RESEARCH, 2018, 265 (02) : 223 - 239
  • [35] FAULT DIAGNOSIS OF WIND TURBINES BASED ON SMOTETOMEK OVERSAMPLING METHOD AND DOMAIN ADAPTIVE TRANSFER LEARNING
    Zhang, Yijie
    Liu, Baoliang
    Wang, Chengmin
    Yang, Jingfei
    Xie, Ning
    [J]. Taiyangneng Xuebao/Acta Energiae Solaris Sinica, 2024, 45 (10): : 635 - 644
  • [36] Human Personality Assessment Based on Gait Pattern Recognition Using Smartphone Sensors
    Ibrar K.
    Fayyaz A.M.
    Khan M.A.
    Alhaisoni M.
    Tariq U.
    Jeon S.
    Nam Y.
    [J]. Computer Systems Science and Engineering, 2023, 46 (02): : 2351 - 2368
  • [37] TensorFlow-Based Automatic Personality Recognition Used in Asynchronous Video Interviews
    Suen, Hung-Yue
    Hung, Kuo-En
    Lin, Chien-Liang
    [J]. IEEE ACCESS, 2019, 7 : 61018 - 61023
  • [38] Personality Recognition from Source Code Based on Lexical, Syntactic and Semantic Features
    Biel, Mikolaj
    Kuta, Marcin
    Kitowski, Jacek
    [J]. COMPUTATIONAL SCIENCE - ICCS 2020, PT II, 2020, 12138 : 351 - 363
  • [39] Voltage Flicker Detection Based on Probability Resampling
    Gao, Haitao
    Xu, Peng
    Tao, Jin
    Huang, Shihui
    Wang, Rugang
    Zhou, Quan
    [J]. ENERGIES, 2020, 13 (13)
  • [40] A RESAMPLING METHOD BASED ON PIVOTAL ESTIMATING FUNCTIONS
    PARZEN, MI
    WEI, LJ
    YING, Z
    [J]. BIOMETRIKA, 1994, 81 (02) : 341 - 350