Similarity Webpage Denoising Data Clustering Algorithm Based on Time Series

被引:0
|
作者
Hang Chun-mei [1 ]
Wu Yang-yang [1 ]
机构
[1] HuaQiao Univ, Dept Comp Sci & Technol, Xiamen 361021, Peoples R China
关键词
Similarity matching; intrinsic mode function; weighted processing;
D O I
10.1109/ICMTMA.2015.240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the processing of large data of unsteady Webpage data or non first sequence Webpage data, we often choose the empirical mode decomposition (EMD), typically exhibiting very high noise ratio. Using EMD to the sequence data for processing, and finally get the intrinsic mode function (IMF) and residual series, among them, there existing the local characteristic data of different time range in the intrinsic mode function, showing the property of removing impurities. The use of the characteristic of different IMF covers, obtained the initial Webpage information by using the decomposition of the EMD to extract the relevant information from the Webpage, for the different features of the IMF selecting different Webpage information weight, then using the Euclidean distance to analysis in the similar level. The finally situation shows that using the intrinsic mode function compared with the previous way of matching directly, the former emphasizing on time series decomposition, to eliminate the influence of the noise, and then being matched by using a weighted processing idea, which makes the matching accuracy have a great promotion, this method is effective.
引用
收藏
页码:984 / 986
页数:3
相关论文
共 50 条
  • [21] A clustering algorithm for multiple data streams based on spectral component similarity
    Chen Ling
    Zou Ling-Jun
    Tu Li
    [J]. INFORMATION SCIENCES, 2012, 183 (01) : 35 - 47
  • [22] A Clustering Algorithm for Multiple Data Streams Based on Spectral Component Similarity
    Zou Lingjun
    Chen Ling
    Tu Ii
    [J]. ICCSE 2008: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION: ADVANCED COMPUTER TECHNOLOGY, NEW EDUCATION, 2008, : 595 - 603
  • [23] An Efficient Inclusive Similarity Based Clustering (ISC) Algorithm for Big Data
    Sangeetha, J.
    Prakash, V. Sinthu Janita
    [J]. 2017 2ND WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT), 2017, : 84 - 88
  • [24] A clustering algorithm for data stream based on grid-tree and similarity
    Huang G.
    Guo W.
    Ren J.
    Chen L.
    [J]. International Journal of Advancements in Computing Technology, 2011, 3 (09) : 17 - 24
  • [25] Characteristic-based clustering for time series data
    Wang, Xiaozhe
    Smith, Kate
    Hyndman, Rob
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2006, 13 (03) : 335 - 364
  • [26] Characteristic-Based Clustering for Time Series Data
    Xiaozhe Wang
    Kate Smith
    Rob Hyndman
    [J]. Data Mining and Knowledge Discovery, 2006, 13 : 335 - 364
  • [27] Graph-based Clustering for Time Series Data
    Li, Peiyu
    Boubrahimi, Soukaina Filali
    Hamdi, Shah Muhammad
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 4464 - 4467
  • [28] Clustering and Classification (Time Series analysis) Based Congestion Control algorithm: Data Mining Approach
    Devare, Manoj
    Kumar, Ajay
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (09): : 241 - 246
  • [29] Accelerating Bayesian Hierarchical Clustering of Time Series Data with a Randomised Algorithm
    Darkins, Robert
    Cooke, Emma J.
    Ghahramani, Zoubin
    Kirk, Paul D. W.
    Wild, David L.
    Savage, Richard S.
    [J]. PLOS ONE, 2013, 8 (04):
  • [30] An evolutionary K-means algorithm for clustering time series data
    Zhang, H
    Ho, TB
    Lin, MS
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 1282 - 1287