SMOTETomek-Based Resampling for Personality Recognition

被引:48
|
作者
Wang, Zhe [1 ]
Wu, Chunhua [1 ]
Zheng, Kangfeng [1 ]
Niu, Xinxin [1 ]
Wang, Xiujuan [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100876, Peoples R China
[2] Beijing Univ Technol, Sch Comp Sci, Beijing 100124, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Personality recognition; PSO-SMOTETomek; sample distribution imbalance; FACEBOOK;
D O I
10.1109/ACCESS.2019.2940061
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The main challenge of user personality recognition is low accuracy resulting from small sample size and severe sample distribution imbalance. This paper analyzes the impact of imbalanced data distribution and positive and negative sample overlap on the machine learning classification model. The classification model is based on the data resampling technique, which can improve the classification accuracy. These problems can be solved once the data are effectively resampled. We present a personality prediction method based on particle swarm optimization (PSO) and synthetic minority oversampling technique+Tomek Link (SMOTETomek) resampling (PSO-SMOTETomek), which, apart from effective SMOTETomek resampling of data samples, is able to execute PSO feature optimization for each set of feature combinations. Validated by simulation, our analysis reveals that the PSO-SMOTETomek method is efficient under a small dataset, and the accuracy of personality recognition is improved by up to around 10%. The results are better than those of previous similar studies. The average accuracies of the plain text dataset and the non-plain text dataset are 75.34% and 78.78%, respectively. The average accuracies of the short text dataset and the long text dataset are 75.34% and 64.25%, respectively. From the experimental results, we found that short text has a better classification effect than long text. Plain text data can still have high personality discrimination accuracy, but there is no relevant external information. The proposed model is able to facilitate the design and implementation of a personality recognition system, and the model significantly outperforms existing state-of-the-art models.
引用
收藏
页码:129678 / 129689
页数:12
相关论文
共 50 条
  • [1] GCN2defect: Graph Convolutional Networks for SMOTETomek-based Sof tware Defect Prediction
    Zeng, Cheng
    Zhou, Chun Ying
    Lv, Sheng Kai
    He, Peng
    Huang, Jie
    [J]. 2021 IEEE 32ND INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE 2021), 2021, : 69 - 79
  • [2] Resampling for face recognition
    Lu, XG
    Jain, AK
    [J]. AUDIO-BASED AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2003, 2688 : 869 - 877
  • [3] CHRONOENDOCRINE RESAMPLING OF CIRCADIAN PERSONALITY MARKERS
    HERMIDA, RC
    HALBERG, F
    HERNANDEZ, C
    [J]. CHRONOBIOLOGIA, 1986, 13 (03) : 265 - 266
  • [4] Speaking Style Based Apparent Personality Recognition
    Yu, Jianguo
    Markov, Konstantin
    Karpov, Alexey
    [J]. SPEECH AND COMPUTER, SPECOM 2019, 2019, 11658 : 540 - 548
  • [5] Personality Recognition based on User Generated Content
    Yuan, Cuixin
    Wu, Junjie
    Li, Hong
    Wang, Lihong
    [J]. 2018 15TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT (ICSSSM), 2018,
  • [6] Long-Tailed SAR Target Recognition Based on Expert Network and Intraclass Resampling
    Liu, Yingbing
    Zhang, Fan
    Ma, Lixiang
    Ma, Fei
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [7] Personality-based emotion recognition for learning environments
    Zatarain-Cabada, Ramon
    Lucia Barron-Estrada, Maria
    Manuel Cardenas-Lopez, Hector
    [J]. 2020 IEEE 20TH INTERNATIONAL CONFERENCE ON ADVANCED LEARNING TECHNOLOGIES (ICALT 2020), 2020, : 352 - 354
  • [8] Sentence-Level Resampling for Named Entity Recognition
    Wang, Xiaochen
    Wang, Yue
    [J]. NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2151 - 2165
  • [9] Study of ABSVM character recognition method based on weighted resampling in P300 Speller
    Qi, Hong-Zhi
    Sun, Chang-Cheng
    Xu, Min-Peng
    Ming, Dong
    Wan, Bai-Kun
    Liu, Zhi-Peng
    Yin, Tao
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2011, 39 (11): : 2534 - 2539
  • [10] Recognition of Protein Pupylation Sites by Adopting Resampling Approach
    Li, Tao
    Chen, Yan
    Li, Taoying
    Jia, Cangzhi
    [J]. MOLECULES, 2018, 23 (12):