A novel clustering algorithm for time-series data based on precise correlation coefficient matching in the IoT

被引:7
|
作者
Li, Haibo [1 ,2 ]
Tong, Juncheng [1 ]
机构
[1] Huaqiao Univ, Coll Comp Sci & Technol, Xiamen 361021, Fujian, Peoples R China
[2] Xiamen Engn Res Ctr Enterprise Interoperabil & Bu, Xiamen 361021, Fujian, Peoples R China
关键词
Internet of Things; time series; pearson correlation coefficient; clustering; precise matching; BIG DATA; INTERNET; THINGS; SYSTEM; MANAGEMENT;
D O I
10.3934/mbe.2019331
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In smart environments based on the Internet of Things (IoT), almost all of the object information that is collected by various sensors is time series data, which records the behavior of the objects. Analyzing the correlation between different time series data, other than those in the same time series, is more helpful to discovering their behavioral relations. This has become one of the important current issues in the IoT. To analyze the correlation, a clustering algorithm named the CPCCM (clustering algorithm based on precise correlation coefficient matching) is presented. First, each initial sequence is split into a set of subsequences by adopting a preset sliding window. Then, the correlation coefficients between any pair of subsequence sets from two sequences are resolved. Those pairs that pass some preset Pearson correlation coefficient threshold are clustered. In the CPCCM, a cross-traversal strategy is introduced to improve the search efficiency. The cross-traversal strategy alternatively searches the subsequences in two subsequence sets. To improve the clustering efficiency, in each initial sequence, adjacent subsequences are merged into longer subsequences and replaced by it if they appear in the same subsequence set. Finally, by analyzing practical electric power consumption data, the CPCCM is shown to be promising and able to be applied in similar scenarios. By comparison with the agglomerative hierarchical clustering algorithm, the major contributions of this work is that the clustering quality is improved by using the strategy of precise matching and cross-traversal, and complexity of the algorithm is reduced by merging adjacent subsequences. Therefore, CPCCM can be applied to analyze behavior between different objects in smart environments.
引用
收藏
页码:6654 / 6671
页数:18
相关论文
共 50 条
  • [1] A clustering algorithm for detecting differential deviations in the multivariate time-series IoT data based on sensor relationship
    Idrees, Rabbia
    Maiti, Ananda
    Garg, Saurabh
    [J]. Knowledge and Information Systems, 2025, 67 (03) : 2641 - 2690
  • [2] IFRAT: An IoT Field Recognition Algorithm based on Time-series Data
    Guo, Shuai
    Guo, Zhongwen
    Qiu, Zhijin
    Liu, Yingjian
    Wang, Yu
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING AND COMMUNICATIONS (BIGCOM), 2017, : 251 - 255
  • [3] A new correlation coefficient for bivariate time-series data
    Erdem, Orhan
    Ceyhan, Elvan
    Varli, Yusuf
    [J]. PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2014, 414 : 274 - 284
  • [4] A novel pattern based clustering methodology for time-series microarray data
    Phan, Sieu
    Famili, Fazel
    Tang, Zoujian
    Pan, Youlian
    Liu, Ziying
    Ouyang, Junjun
    Lenferink, Anne
    O'Connor, Maureen Mc-Court
    [J]. INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2007, 84 (05) : 585 - 597
  • [5] Clustering time-series medical databases based on the improved multiscale matching
    Hirano, S
    Tsumoto, S
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2005, 3488 : 612 - 621
  • [6] COEFFICIENT OF DIRECTIONAL CORRELATION FOR TIME-SERIES ANALYSES
    STRAHAN, RF
    [J]. PSYCHOLOGICAL BULLETIN, 1971, 76 (03) : 211 - &
  • [7] Clustering of multivariate time-series data
    Singhal, A
    Seborg, DE
    [J]. PROCEEDINGS OF THE 2002 AMERICAN CONTROL CONFERENCE, VOLS 1-6, 2002, 1-6 : 3931 - 3936
  • [8] Clustering multivariate time-series data
    Singhal, A
    Seborg, DE
    [J]. JOURNAL OF CHEMOMETRICS, 2005, 19 (08) : 427 - 438
  • [9] DIRECTIONAL CORRELATION IN TIME-SERIES DATA
    STRAHAN, RR
    [J]. PSYCHOPHYSIOLOGY, 1970, 6 (05) : 652 - &
  • [10] Correlation Analysis of Network Big Data and Film Time-Series Data Based on Machine Learning Algorithm
    Li, Na
    Xia, Langbo
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2022, 2022