Massive Data Mining Algorithm for Web Text Based on Clustering Algorithm

被引:4
|
作者
Luo, Nan-Chao [1 ]
机构
[1] Aba Teachers Univ, Sch Math & Comp Sci, Wenchuan 623002, Sichuan, Peoples R China
关键词
clustering algorithm; Web text; massive data; data mining algorithm;
D O I
10.20965/jaciii.2019.p0362
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The massive data of Web text has the characteristics of high dimension and sparse spatial distribution, which makes the problems of low mining precision and long time consuming in the process of mining mass data of Web text by using the current data mining algorithms. To solve these problems, a massive data mining algorithm of Web text based on clustering algorithm is proposed. By using chi square test, the feature words of massive data are extracted and the set of characteristic words is gotten. Hierarchical clustering of feature sets is made, TF-IDF values of each word in clustering set are calculated, and vector space model is constructed. By introducing fair operation and clone operation on bee colony algorithm, the diversity of vector space models can be improved. For the result of the clustering center, K-means is introduced to extract the local centroid and improve the quality of data mining. Experimental results show that the proposed algorithm can effectively improve data mining accuracy and time consuming.
引用
收藏
页码:362 / 365
页数:4
相关论文
共 50 条
  • [1] DFSSM Based Web Text Clustering Algorithm
    Qian, Rong
    Zhang, Kejun
    Zhao, Xiaorong
    [J]. PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, : 703 - 707
  • [2] A New Web Text Clustering Algorithm Based on DFSSM
    Yang, Bingru
    Song, Zefeng
    Wang, Yinglong
    Song, Wei
    [J]. PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, 2008, : 27 - 32
  • [3] WTCA: A Web Text Clustering Algorithm Based on DFSSM
    Zheng, Yu
    Rong, Qian
    [J]. PROCEEDINGS OF THE 27TH CHINESE CONTROL CONFERENCE, VOL 5, 2008, : 811 - +
  • [4] Fuzzy Set Based Clustering Algorithm of Web Text
    Wan, Hongxin
    Peng, Yun
    [J]. ADVANCES IN MECHATRONICS AND CONTROL ENGINEERING III, 2014, 678 : 19 - +
  • [5] Research on the Massive Redundant Data Mining Algorithm based on Kernel Clustering and Data Cleaning Technology
    Mao, YaoFeng
    [J]. 2015 2ND INTERNATIONAL SYMPOSIUM ON ENGINEERING TECHNOLOGY, EDUCATION AND MANAGEMENT (ISETEM 2015), 2015, : 112 - 117
  • [6] A Hash-based Hierarchical Algorithm for Massive Text Clustering
    Luo, Yin
    Fu, Yan
    [J]. 2009 INTERNATIONAL SYMPOSIUM ON WEB INFORMATION SYSTEMS AND APPLICATIONS, PROCEEDINGS, 2009, : 140 - +
  • [7] A Kind of Improved Data Clustering Algorithm in Web Log Mining
    Guo, Jin
    Zhang, Shengbing
    Qiu, Zheng
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS RESEARCH AND MECHATRONICS ENGINEERING, 2015, 121 : 2115 - 2119
  • [8] A Text Mining Model Based on Improved Density Clustering Algorithm
    Chen Qi
    Lu Jianfeng
    Zhang Hao
    [J]. 2013 IEEE 4TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC), 2014, : 337 - 339
  • [9] Fuzzy Set Based Web Opinion Text Clustering Algorithm
    Wan, Hongxin
    Peng, Yun
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 2604 - 2607
  • [10] An Algorithm of Web Text Clustering Analysis Based on Fuzzy Set
    Peng, Yun
    Ding, Shu-liang
    [J]. ISCSCT 2008: INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, : 109 - 113