SMOTETomek-Based Resampling for Personality Recognition

被引:48
|
作者
Wang, Zhe [1 ]
Wu, Chunhua [1 ]
Zheng, Kangfeng [1 ]
Niu, Xinxin [1 ]
Wang, Xiujuan [2 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Cyberspace Secur, Beijing 100876, Peoples R China
[2] Beijing Univ Technol, Sch Comp Sci, Beijing 100124, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Personality recognition; PSO-SMOTETomek; sample distribution imbalance; FACEBOOK;
D O I
10.1109/ACCESS.2019.2940061
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The main challenge of user personality recognition is low accuracy resulting from small sample size and severe sample distribution imbalance. This paper analyzes the impact of imbalanced data distribution and positive and negative sample overlap on the machine learning classification model. The classification model is based on the data resampling technique, which can improve the classification accuracy. These problems can be solved once the data are effectively resampled. We present a personality prediction method based on particle swarm optimization (PSO) and synthetic minority oversampling technique+Tomek Link (SMOTETomek) resampling (PSO-SMOTETomek), which, apart from effective SMOTETomek resampling of data samples, is able to execute PSO feature optimization for each set of feature combinations. Validated by simulation, our analysis reveals that the PSO-SMOTETomek method is efficient under a small dataset, and the accuracy of personality recognition is improved by up to around 10%. The results are better than those of previous similar studies. The average accuracies of the plain text dataset and the non-plain text dataset are 75.34% and 78.78%, respectively. The average accuracies of the short text dataset and the long text dataset are 75.34% and 64.25%, respectively. From the experimental results, we found that short text has a better classification effect than long text. Plain text data can still have high personality discrimination accuracy, but there is no relevant external information. The proposed model is able to facilitate the design and implementation of a personality recognition system, and the model significantly outperforms existing state-of-the-art models.
引用
收藏
页码:129678 / 129689
页数:12
相关论文
共 50 条
  • [41] An improved particle filter based on genetic resampling
    Zhao Bin
    Hu Jian-wang
    Ji Bing
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING, 2015, 124 : 692 - 697
  • [42] Resampling-Based Change Point Estimation
    Fiosina, Jelena
    Fiosins, Maksims
    [J]. ADVANCES IN INTELLIGENT DATA ANALYSIS X: IDA 2011, 2011, 7014 : 150 - 161
  • [43] On the Estimation of the Extremal Index Based on Scaling and Resampling
    Hamidieh, Kamal
    Stoev, Stilian
    Michailidis, George
    [J]. JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2009, 18 (03) : 731 - 755
  • [44] RESAMPLING BASED ON STATISTICAL PROPERTIES OF DATA SETS
    Bondarenko, Julia
    [J]. ICINCO 2009: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL 1: INTELLIGENT CONTROL SYSTEMS AND OPTIMIZATION, 2009, : 143 - 148
  • [45] Detection of Resampling Based on Singular Value Decomposition
    Wang, Ran
    Ping, Xijian
    [J]. PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS (ICIG 2009), 2009, : 879 - 884
  • [46] Wavelet packet shrinkage based adaptive resampling
    Huang, CC
    Sabharwal, A
    Zheng, YF
    Potter, LC
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : A1581 - &
  • [47] Image resampling detection based on texture classification
    Hou, Xiaodan
    Zhang, Tao
    Xiong, Gang
    Zhang, Yan
    Ping, Xin
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 72 (02) : 1681 - 1708
  • [48] The Workshop on Computational Personality Recognition 2014
    Celli, Fabio
    Lepri, Bruno
    Biel, Joan-Isaac
    Gatica-Perez, Daniel
    Riccardi, Giuseppe
    Pianesi, Fabio
    [J]. PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1245 - 1246
  • [49] Personality Recognition from Facebook Text
    Claudino da Silva, Barbara Barbosa
    Paraboni, Ivandre
    [J]. COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 107 - 114
  • [50] Deep Personality Recognition For Deception Detection
    An, Guozhen
    Levitan, Sarah Ita
    Hirschberg, Julia
    Levitan, Rivka
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 421 - 425