Feature Dimension Reduction Optimization Algorithm for Massive Micro-Blog Data based on Hadoop

被引:0
|
作者
Zhu H. [1 ]
Li W. [1 ]
Li H. [1 ]
机构
[1] School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou
基金
中国国家自然科学基金;
关键词
Feature dimension reduction; Feature selection; Hadoop; HDFS; Micro-blogging emotion;
D O I
10.23940/ijpe.19.06.p3.15181527
中图分类号
学科分类号
摘要
For the micro-blog sentiment analysis problem in big data environments, the "dimension disaster" caused by the continuous increase in text information data brings great challenges to the emotional analysis of micro-blogs. To solve this problem, this paper proposes a fusion of the advantages of three feature dimensionality reduction algorithms, based on the traditional document frequency (DF), mutual information (MI), and chi-square test (CHI). Firstly, the document frequency factor is added to the mutual information (MI) algorithm to solve the problem of low-frequency word defects. Then, the standard score factor is added to the chi-square test (CHI) algorithm to solve the negative correlation problem. Finally, the average value is calculated and the advantages of the three algorithms are fused. An improved Proposed DF-MI-CHI fusion algorithm is proposed. The simulation results show that after using this algorithm to process the micro-blog data, the accuracy of sentiment analysis is improved and maintained at 95%. The recall rate is more than 90%, and the F value is maintained between 92% and 94%. In the % interval, it is higher than other improved algorithms and tends to be stable, which indicates that the algorithm can effectively improve the accuracy and efficiency of micro-blog emotional sentiment analysis when dealing with massive micro-blog text data. © 2019 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:1518 / 1527
页数:9
相关论文
共 50 条
  • [1] Sentiment Feature Selection Algorithm For Chinese Micro-blog
    Kun, Yu Jian
    Lei, Zhao
    2014 INTERNATIONAL CONFERENCE ON MANAGEMENT OF E-COMMERCE AND E-GOVERNMENT (ICMECG), 2014, : 114 - 118
  • [2] Real Time Micro-Blog Summarization based on Hadoop/HBase
    Lee, Sanghoon
    Shakya, Sunny
    Sunderraman, Raj
    Belkasim, Saeid
    2013 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY - WORKSHOPS (WI-IAT), VOL 3, 2013, : 46 - 49
  • [3] Modelling on Clustering Algorithm Based on Iteration Feature Selection for Micro-blog Posts
    Gao, Kai
    Zhang, Bao-quan
    2014 PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION & CONTROL (ICMIC), 2014, : 295 - 299
  • [4] Tightening Data Analysis and Feature Extraction for Micro-blog Recommendation
    Li, Bo
    Wu, Xiang
    Xiang, Biao
    Zhang, Hui
    PROCEEDINGS OF THE 2013 6TH INTERNATIONAL CONFERENCE ON BIOMEDICAL ENGINEERING AND INFORMATICS (BMEI 2013), VOLS 1 AND 2, 2013, : 683 - 688
  • [5] Micro-blog influence evaluation method based on particle swarm optimization algorithm
    Song Junyuan
    2017 INTERNATIONAL CONFERENCE ON ROBOTS & INTELLIGENT SYSTEM (ICRIS), 2017, : 206 - 209
  • [6] Micro-blog hot topic prediction of LSSVM based on QPSO algorithm optimization
    Zhang, Yongjun
    Ma, Jialin
    Liu, Jinling
    Xiao, Shaozhang
    Metallurgical and Mining Industry, 2015, 7 (09): : 154 - 160
  • [7] A Feature Selection Algorithm of Micro-Blog Based on Rough Set and Probability-weighted
    Zhu, Yanhui
    Ai, Junhua
    Zeng, Zhigao
    Yang, Mingnian
    4TH INTERNATIONAL CONFERENCE ON MECHANICAL AUTOMATION AND MATERIALS ENGINEERING (ICMAME 2015), 2015, : 108 - 114
  • [8] An Ensemble Classification Algorithm of Micro-Blog Sentiment Based on Feature Selection and Differential Evolution
    Li, Hongchan
    Ma, Zishuai
    Zhu, Haodong
    Ma, Yu
    Chang, Zhifang
    IEEE ACCESS, 2022, 10 : 70467 - 70475
  • [9] Cyber Teaching Optimization Based on Micro-blog Communication
    Zhang, Bo
    PROCEEDINGS OF THE 2ND ANNUAL INTERNATIONAL CONFERENCE ON SOCIAL SCIENCE AND CONTEMPORARY HUMANITY DEVELOPMENT (SSCHD), 2016, 73 : 396 - 399
  • [10] The Domain Classification Algorithm Based on KNN in Micro-blog
    Zhu, Guofeng
    Zhou, Zhurong
    Han, Fengjiao
    Ying, Zhongyun
    PROCEEDINGS OF 2013 IEEE 4TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2012, : 188 - 192