Feature Dimension Reduction Optimization Algorithm for Massive Micro-Blog Data based on Hadoop

被引:0
|
作者
Zhu H. [1 ]
Li W. [1 ]
Li H. [1 ]
机构
[1] School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou
基金
中国国家自然科学基金;
关键词
Feature dimension reduction; Feature selection; Hadoop; HDFS; Micro-blogging emotion;
D O I
10.23940/ijpe.19.06.p3.15181527
中图分类号
学科分类号
摘要
For the micro-blog sentiment analysis problem in big data environments, the "dimension disaster" caused by the continuous increase in text information data brings great challenges to the emotional analysis of micro-blogs. To solve this problem, this paper proposes a fusion of the advantages of three feature dimensionality reduction algorithms, based on the traditional document frequency (DF), mutual information (MI), and chi-square test (CHI). Firstly, the document frequency factor is added to the mutual information (MI) algorithm to solve the problem of low-frequency word defects. Then, the standard score factor is added to the chi-square test (CHI) algorithm to solve the negative correlation problem. Finally, the average value is calculated and the advantages of the three algorithms are fused. An improved Proposed DF-MI-CHI fusion algorithm is proposed. The simulation results show that after using this algorithm to process the micro-blog data, the accuracy of sentiment analysis is improved and maintained at 95%. The recall rate is more than 90%, and the F value is maintained between 92% and 94%. In the % interval, it is higher than other improved algorithms and tends to be stable, which indicates that the algorithm can effectively improve the accuracy and efficiency of micro-blog emotional sentiment analysis when dealing with massive micro-blog text data. © 2019 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:1518 / 1527
页数:9
相关论文
共 50 条
  • [41] A Quantitative Evaluation Method of Micro-blog User Authority Based on Multi-Feature Fusion
    Zhang Y.-S.
    Zheng J.
    Tang A.-J.
    2017, Chinese Institute of Electronics (45): : 2800 - 2809
  • [42] UR Rank: Micro-blog User Influence Ranking Algorithm Based on User Relationship
    Yao, Wenbin
    Yang, Yiwei
    Wang, Dongbin
    COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, COLLABORATECOM 2017, 2018, 252 : 394 - 404
  • [43] MI-based Tencent Micro-Blog Research
    Zeng Dan
    Li Qinghui
    Wang Yicheng
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 5931 - 5934
  • [44] Area Definition and Public Opinion Research of Natural Disaster Based on Micro-blog Data
    He, Yue
    Wen, Lijun
    Zhu, Tingting
    7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT (ITQM 2019): INFORMATION TECHNOLOGY AND QUANTITATIVE MANAGEMENT BASED ON ARTIFICIAL INTELLIGENCE, 2019, 162 : 614 - 622
  • [45] Analysis of large data classification based on knowledge element in micro-blog short text
    Xia Wendong
    Liu Yuanfeng
    Chen Deli
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING (AMCCE 2017), 2017, 118 : 1051 - 1056
  • [46] Research on Influence Evaluation Algorithm of Tibetan Micro-blog User Behavior
    Zhang, Jinxi
    Yu, Hongzhi
    Jiang, Jing
    2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL 2, 2017, : 181 - 185
  • [47] The Algorithm Study about the Information Dissemination Probability in Micro-blog Community
    Wang, Bo
    Zhang, Yuwang
    Shao, Li
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCES IN MECHANICAL ENGINEERING AND INDUSTRIAL INFORMATICS, 2015, 15 : 606 - 610
  • [48] An Efficient Influence Maximization Algorithm to Discover Influential Users in Micro-blog
    Ma, Qian
    Ma, Jun
    WEB-AGE INFORMATION MANAGEMENT, WAIM 2014, 2014, 8485 : 113 - 124
  • [49] Hot topic identification from micro-blog based on improved Single-pass algorithm
    Feng J.
    Ding Y.
    Luo X.
    Journal of Computational Methods in Sciences and Engineering, 2017, 17 (04) : 791 - 798
  • [50] Problem-based learning effectiveness on micro-blog and blog for students: a case study
    Huang, Shu-Hsien
    Huang, Yueh-Min
    Wu, Ting-Ting
    Chen, Hong-Ren
    Chang, Shih-Ming
    INTERACTIVE LEARNING ENVIRONMENTS, 2016, 24 (06) : 1334 - 1354