A novel clustering algorithm for time-series data based on precise correlation coefficient matching in the IoT

被引:7
|
作者
Li, Haibo [1 ,2 ]
Tong, Juncheng [1 ]
机构
[1] Huaqiao Univ, Coll Comp Sci & Technol, Xiamen 361021, Fujian, Peoples R China
[2] Xiamen Engn Res Ctr Enterprise Interoperabil & Bu, Xiamen 361021, Fujian, Peoples R China
关键词
Internet of Things; time series; pearson correlation coefficient; clustering; precise matching; BIG DATA; INTERNET; THINGS; SYSTEM; MANAGEMENT;
D O I
10.3934/mbe.2019331
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In smart environments based on the Internet of Things (IoT), almost all of the object information that is collected by various sensors is time series data, which records the behavior of the objects. Analyzing the correlation between different time series data, other than those in the same time series, is more helpful to discovering their behavioral relations. This has become one of the important current issues in the IoT. To analyze the correlation, a clustering algorithm named the CPCCM (clustering algorithm based on precise correlation coefficient matching) is presented. First, each initial sequence is split into a set of subsequences by adopting a preset sliding window. Then, the correlation coefficients between any pair of subsequence sets from two sequences are resolved. Those pairs that pass some preset Pearson correlation coefficient threshold are clustered. In the CPCCM, a cross-traversal strategy is introduced to improve the search efficiency. The cross-traversal strategy alternatively searches the subsequences in two subsequence sets. To improve the clustering efficiency, in each initial sequence, adjacent subsequences are merged into longer subsequences and replaced by it if they appear in the same subsequence set. Finally, by analyzing practical electric power consumption data, the CPCCM is shown to be promising and able to be applied in similar scenarios. By comparison with the agglomerative hierarchical clustering algorithm, the major contributions of this work is that the clustering quality is improved by using the strategy of precise matching and cross-traversal, and complexity of the algorithm is reduced by merging adjacent subsequences. Therefore, CPCCM can be applied to analyze behavior between different objects in smart environments.
引用
收藏
页码:6654 / 6671
页数:18
相关论文
共 50 条
  • [31] A novel short-term load forecasting framework based on time-series clustering and early classification algorithm
    Chen, Zhe
    Chen, Yongbao
    Xiao, Tong
    Wang, Huilong
    Hou, Pengwei
    [J]. Energy and Buildings, 2021, 251
  • [32] A novel short-term load forecasting framework based on time-series clustering and early classification algorithm
    Chen, Zhe
    Chen, Yongbao
    Xiao, Tong
    Wang, Huilong
    Hou, Pengwei
    [J]. ENERGY AND BUILDINGS, 2021, 251
  • [33] A MPAA-based iterative clustering algorithm augmented by nearest neighbors search for time-series data streams
    Lin, J
    Vlachos, M
    Keogh, E
    Gunopulos, D
    Liu, JW
    Yu, SJ
    Le, JJ
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 333 - 342
  • [34] IoTPass: IoT Data Management System for Processing Time-series Data
    Nie, Zehua
    Su, Can
    Mao, Yichen
    Bian, Kaigui
    [J]. 2022 TENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, CBD, 2022, : 288 - 293
  • [35] Time-Series Forecasting to Fill Missing Data in IoT Sensor Data
    Rosero-Montalvo, Paul D.
    Tozun, Pinar
    Hernandez, Wilmar
    [J]. IEEE SENSORS LETTERS, 2023, 7 (09)
  • [36] Wavelet based correlation coefficient of time series of Saudi Meteorological Data
    Rehman, S.
    Siddiqi, A. H.
    [J]. CHAOS SOLITONS & FRACTALS, 2009, 39 (04) : 1764 - 1789
  • [37] Key radar signal fast recognition method based on clustering and time-series correlation
    Zhang Y.
    Guo W.
    Kang K.
    Yao Y.
    Wang P.
    [J]. 1600, Chinese Institute of Electronics (42): : 597 - 602
  • [38] A time-series approach for clustering farms based on slaughterhouse health aberration data
    Hulsegge, B.
    de Greef, K. H.
    [J]. PREVENTIVE VETERINARY MEDICINE, 2018, 153 : 64 - 70
  • [39] Research on cognitive biology based algorithm for mining time-series data
    Yang, BR
    Li, LX
    Song, W
    [J]. International Conference on Computing, Communications and Control Technologies, Vol 2, Proceedings, 2004, : 279 - 284
  • [40] A novel clustering algorithm based on graph matching
    Lin, Guoyuan
    Bie, Yuyu
    Wang, Guohui
    Lei, Min
    [J]. Journal of Software, 2013, 8 (04) : 1035 - 1041