Feature Dimension Reduction Optimization Algorithm for Massive Micro-Blog Data based on Hadoop

被引:0
|
作者
Zhu H. [1 ]
Li W. [1 ]
Li H. [1 ]
机构
[1] School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou
基金
中国国家自然科学基金;
关键词
Feature dimension reduction; Feature selection; Hadoop; HDFS; Micro-blogging emotion;
D O I
10.23940/ijpe.19.06.p3.15181527
中图分类号
学科分类号
摘要
For the micro-blog sentiment analysis problem in big data environments, the "dimension disaster" caused by the continuous increase in text information data brings great challenges to the emotional analysis of micro-blogs. To solve this problem, this paper proposes a fusion of the advantages of three feature dimensionality reduction algorithms, based on the traditional document frequency (DF), mutual information (MI), and chi-square test (CHI). Firstly, the document frequency factor is added to the mutual information (MI) algorithm to solve the problem of low-frequency word defects. Then, the standard score factor is added to the chi-square test (CHI) algorithm to solve the negative correlation problem. Finally, the average value is calculated and the advantages of the three algorithms are fused. An improved Proposed DF-MI-CHI fusion algorithm is proposed. The simulation results show that after using this algorithm to process the micro-blog data, the accuracy of sentiment analysis is improved and maintained at 95%. The recall rate is more than 90%, and the F value is maintained between 92% and 94%. In the % interval, it is higher than other improved algorithms and tends to be stable, which indicates that the algorithm can effectively improve the accuracy and efficiency of micro-blog emotional sentiment analysis when dealing with massive micro-blog text data. © 2019 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:1518 / 1527
页数:9
相关论文
共 50 条
  • [21] New words recognition algorithm and application based on micro-blog hot
    Zhou Qing
    Chen YeWang
    2015 SEVENTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2015), 2015, : 698 - 700
  • [22] Micro-blog sentiment classification method based on the personality and bagging algorithm
    Yang W.
    Yuan T.
    Wang L.
    Yuan, Tingting (1243058154@stu.xju.edu.cn), 1600, MDPI AG (12):
  • [23] Micro-Blog Sentiment Classification Method Based on the Personality and Bagging Algorithm
    Yang, Wenzhong
    Yuan, Tingting
    Wang, Liejun
    FUTURE INTERNET, 2020, 12 (04):
  • [24] An Improved Recommendation Algorithm for Micro-blog Network Advertisement
    Yang, Yanxia
    2018 17TH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS FOR BUSINESS ENGINEERING AND SCIENCE (DCABES), 2018, : 284 - 287
  • [25] A Feature-Rich CRF Segmenter for Chinese Micro-Blog
    Leng, Yabin
    Liu, Weiwei
    Wang, Sheng
    Wang, Xiaojie
    NATURAL LANGUAGE UNDERSTANDING AND INTELLIGENT APPLICATIONS (NLPCC 2016), 2016, 10102 : 854 - 861
  • [26] Research on Individualized Recommendation Algorithm for Tibetan Micro-blog
    Zhang, Jinxi
    Yu, Hongzhi
    Jiang, Jing
    PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 2589 - 2593
  • [27] Credibility Algorithm of Information Sources in Micro-blog Community
    Wang Bo
    Zhang Yu-wang
    Shao Li
    2015 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING - 22ND ANNUAL CONFERENCE PROCEEDINGS, VOLS I AND II, 2015, : 142 - 146
  • [28] Research on the Recommendation of Micro-blog Network Advertisement based on Hybrid Recommendation Algorithm
    Yang, Yan-xia
    CURRENT TRENDS IN COMPUTER SCIENCE AND MECHANICAL AUTOMATION, VOL 1, 2017, : 171 - 180
  • [29] A Comparative Study on Acquisition Techniques for Micro-Blog Data
    Wang, Jie
    Lin, Yun
    Yu, Ya-Qing
    Yang, Xue
    FUZZY SYSTEMS, KNOWLEDGE DISCOVERY AND NATURAL COMPUTATION SYMPOSIUM (FSKDNC 2013), 2013, : 394 - 402
  • [30] Topology-based algorithm for users' influence on specific topics in micro-blog
    Yuan, Jinfeng
    Li, Li
    Luo, Le
    Huang, Min
    Journal of Information and Computational Science, 2013, 10 (08): : 2247 - 2259