Feature Dimension Reduction Optimization Algorithm for Massive Micro-Blog Data based on Hadoop

被引:0
|
作者
Zhu H. [1 ]
Li W. [1 ]
Li H. [1 ]
机构
[1] School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou
基金
中国国家自然科学基金;
关键词
Feature dimension reduction; Feature selection; Hadoop; HDFS; Micro-blogging emotion;
D O I
10.23940/ijpe.19.06.p3.15181527
中图分类号
学科分类号
摘要
For the micro-blog sentiment analysis problem in big data environments, the "dimension disaster" caused by the continuous increase in text information data brings great challenges to the emotional analysis of micro-blogs. To solve this problem, this paper proposes a fusion of the advantages of three feature dimensionality reduction algorithms, based on the traditional document frequency (DF), mutual information (MI), and chi-square test (CHI). Firstly, the document frequency factor is added to the mutual information (MI) algorithm to solve the problem of low-frequency word defects. Then, the standard score factor is added to the chi-square test (CHI) algorithm to solve the negative correlation problem. Finally, the average value is calculated and the advantages of the three algorithms are fused. An improved Proposed DF-MI-CHI fusion algorithm is proposed. The simulation results show that after using this algorithm to process the micro-blog data, the accuracy of sentiment analysis is improved and maintained at 95%. The recall rate is more than 90%, and the F value is maintained between 92% and 94%. In the % interval, it is higher than other improved algorithms and tends to be stable, which indicates that the algorithm can effectively improve the accuracy and efficiency of micro-blog emotional sentiment analysis when dealing with massive micro-blog text data. © 2019 Totem Publisher, Inc. All rights reserved.
引用
收藏
页码:1518 / 1527
页数:9
相关论文
共 50 条
  • [31] User Relationship Mining on Micro-Blog Based on K-Means Algorithm
    Lin, Xiaoli
    Yang, Yanxia
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL AND INFORMATION SCIENCES (ICCIS 2014), 2014, : 491 - 496
  • [32] The Feature Selection Based on CRFs Model for Chinese Named Entity Recognition in Micro-blog
    Li, Fang
    Du, Ya-Jun
    Zhao, Hong-Yuan
    INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND COMMUNICATION ENGINEERING (CSCE 2015), 2015, : 987 - 993
  • [33] Research on Algorithm of Extracting Micro-blog's Hot Topics
    Peng, Feifei
    Qian, Xu
    Meng, Hui
    Zhou, Dan
    Li, Gaoren
    2011 INTERNATIONAL CONFERENCE ON ELECTRONICS, COMMUNICATIONS AND CONTROL (ICECC), 2011, : 986 - 989
  • [34] Estimate of Public Environment-Emotional Index Based on Micro-blog Data
    Chen, Guyuan
    Wang, Chao
    Liu, Fuqiang
    Wang, Feng
    Li, Shun
    Huang, Mingxiang
    2016 IEEE INTERNATIONAL CONFERENCE ON INTERNET OF THINGS (ITHINGS) AND IEEE GREEN COMPUTING AND COMMUNICATIONS (GREENCOM) AND IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING (CPSCOM) AND IEEE SMART DATA (SMARTDATA), 2016, : 854 - 858
  • [35] A systematic exploration of the micro-blog feature space for teens stress detection
    Zhao, Liang
    Li, Qi
    Xue, Yuanyuan
    Jia, Jia
    Feng, Ling
    HEALTH INFORMATION SCIENCE AND SYSTEMS, 2016, 4
  • [36] Informal Learning Model Design based on Micro-blog
    Yu Wei
    Xing Ruonan
    Kuang Rongrong
    2015 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY IN MEDICINE AND EDUCATION (ITME), 2015, : 480 - 483
  • [37] Micro-blog Commercial Word Extraction Based On Improved TF-IDF Algorithm
    Huang, Xing
    Wu, Qing
    2013 IEEE INTERNATIONAL CONFERENCE OF IEEE REGION 10 (TENCON), 2013,
  • [38] The Research of Community Detection Based on Micro-blog Users
    Li, Yameng
    Liu, Ruifang
    Peng, Zonghui
    PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON PROGRESS IN INFORMATCS AND COMPUTING (IEEE PIC), 2015, : 348 - 352
  • [39] Micro-blog Emotion Orientation Analysis Algorithm Based on Tibetan and Chinese Mixed Text
    Jiang, Tao
    Jiang, Jing
    Dai, Yugang
    Li, Ailing
    PROCEEDINGS OF THE 1ST INTERNATIONAL SYMPOSIUM ON SOCIAL SCIENCE (ISSS-15), 2015, 24 : 157 - 162
  • [40] An Analysis on the Micro-Blog Topic "The Shared Bicycle" Based on K-Means Algorithm
    Lu, Yonghe
    Zhai, Yuanyuan
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2019, 868 : 1009 - 1024