Distance variable improvement of time-series big data stream evaluation

被引:3
|
作者
Wibisono, Ari [1 ]
Mursanto, Petrus [1 ]
Adibah, Jihan [1 ]
Bayu, Wendy D. W. T. [1 ]
Rizki, May Iffah [1 ]
Hasani, Lintang Matahari [1 ]
Ahli, Valian Fil [1 ]
机构
[1] Univ Indonesia, Fac Comp Sci, Ui Depok 16424, Depok, Indonesia
关键词
Intelligent Systems; Data stream; Distance improvement; Big data regression; PREDICTION;
D O I
10.1186/s40537-020-00359-w
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real-time information mining of a big dataset consisting of time series data is a very challenging task. For this purpose, we propose using the mean distance and the standard deviation to enhance the accuracy of the existing fast incremental model tree with the drift detection (FIMT-DD) algorithm. The standard FIMT-DD algorithm uses the Hoeffding bound as its splitting criterion. We propose the further use of the mean distance and standard deviation, which are used to split a tree more accurately than the standard method. We verify our proposed method using the large Traffic Demand Dataset, which consists of 4,000,000 instances; Tennet's big wind power plant dataset, which consists of 435,268 instances; and a road weather dataset, which consists of 30,000,000 instances. The results show that our proposed FIMT-DD algorithm improves the accuracy compared to the standard method and Chernoff bound approach. The measured errors demonstrate that our approach results in a lower Mean Absolute Percentage Error (MAPE) in every stage of learning by approximately 2.49% compared with the Chernoff Bound method and 19.65% compared with the standard method.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Spectral analysis of time-series data
    Gregson, RAM
    [J]. CONTEMPORARY PSYCHOLOGY-APA REVIEW OF BOOKS, 1999, 44 (04): : 306 - 309
  • [42] ADAPTIVE IDENTIFICATION BY A TIME-VARIABLE MODEL OF A TIME-SERIES
    MATSUZAWA, H
    ISHII, N
    IWATA, A
    SUZUMURA, N
    [J]. ELECTRONICS & COMMUNICATIONS IN JAPAN, 1978, 61 (02): : 41 - 49
  • [43] MEASURING THE INSTABILITY OF TIME-SERIES DATA
    DUGGAN, JE
    [J]. OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 1979, 41 (03) : 239 - 246
  • [44] MEASURING INSTABILITY OF TIME-SERIES DATA
    CUDDY, JDA
    DELLAVALLE, PA
    [J]. OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 1978, 40 (01) : 79 - 85
  • [45] TIME-SERIES ANALYSIS OF BIOLOGICAL DATA
    NICHOLLS, DF
    [J]. BIOMETRICS, 1979, 35 (03) : 698 - 698
  • [46] Techniques to Compress Time-Series Data
    Iqbal, Owais
    Keskar, Dr R. B.
    [J]. 2021 10TH INTERNATIONAL CONFERENCE ON POWER SCIENCE AND ENGINEERING (ICPSE 2021), 2021, : 56 - 60
  • [47] Neural Decomposition of Time-Series Data
    Godfrey, Luke B.
    Gashler, Michael S.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 2796 - 2801
  • [48] The analysis of chaotic time-series data
    Kostelich, EJ
    [J]. SYSTEMS & CONTROL LETTERS, 1997, 31 (05) : 313 - 319
  • [49] A time-series database for environmental data
    Halliburton, GA
    [J]. ENVIRONMENTAL SOFTWARE SYSTEMS, VOL 2, 1997, : 205 - 208
  • [50] Clustering multivariate time-series data
    Singhal, A
    Seborg, DE
    [J]. JOURNAL OF CHEMOMETRICS, 2005, 19 (08) : 427 - 438