Incremental Factorization of Big Time Series Data with Blind Factor Approximation

被引:72
|
作者
Chen, Dan [1 ]
Tang, Yunbo [1 ]
Zhang, Hao [1 ]
Wang, Lizhe [2 ]
Li, Xiaoli [3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
[3] Beijing Normal Univ, Natl Key Lab Cognit Neurosci & Learning, Beijing 100875, Peoples R China
基金
中国国家自然科学基金;
关键词
Big time series data; tensor factorization; blind factor approximation; parallel factor analysis; variational Bayesian inference; EEG; massively parallel computing; NONNEGATIVE MATRIX; MULTIWAY ANALYSIS; ALGORITHMS;
D O I
10.1109/TKDE.2019.2931687
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Extracting the latent factors of big time series data is an important means to examine the dynamic complex systems under observation. These low-dimensional and "small" representations reveal the key insights to the overall mechanisms, which can otherwise be obscured by the notoriously high dimensionality and scale of big data as well as the enormously complicated interdependencies amongst data elements. However, grand challenges still remain: (1) to incrementally derive the multi-mode factors of the augmenting big data and (2) to achieve this goal under the circumstance of insufficient a priori knowledge. This study develops an incrementally parallel factorization solution (namely I-PARAFAC) for huge augmenting tensors (multi-way arrays) consisting of three phases over a cutting-edge GPU cluster: in the "giant-step" phase, a variational Bayesian inference (VBI) model estimates the distribution of the close neighborhood of each factor in a high confidence level without the need for a priori knowledge of the tensor or problem domain; in the "baby-step" phase, a massively parallel Fast-HALS algorithm (namely G-HALS) has been developed to derive the accurate subfactors of each subtensor on the basis of the initial factors; in the final fusion phase, I-PARAFAC fuses the known factors of the original tensor and those accurate subfactors of the "increment" to achieve the final full factors. Experimental results indicate that: (1) the VBI model enables a blind factor approximation, where the distribution of the close neighborhood of each final factor can be quickly derived (10 iterations for the test case). As a result, the model of a low time complexity significantly accelerates the derivation of the final accurate factors and lowers the risks of errors; (2) I-PARAFAC significantly outperforms even the latest high performance counterpart when handling augmenting tensors, e.g., the increased overhead is only proportional to the increment while the latter has to repeatedly factorize the whole tensor, and the overhead in fusing subfactors is always minimal; (3) I-PARAFAC can factorize a huge tensor (volume up to 500 TB over 50 nodes) as a whole with the capability several magnitudes higher than conventional methods, and the runtime is in the order of $\frac{1}{n}$1n to the number of compute nodes; (4) I-PARAFAC supports correct factorization-based analysis of a real 4-order EEG dataset captured from a variety of epilepsy patients. Overall, it should also be noted that counterpart methods have to derive the whole tensor from the scratch if the tensor is augmented in any dimension; as a contrast, the I-PARAFAC framework only needs to incrementally compute the full factors of the huge augmented tensor.
引用
收藏
页码:569 / 584
页数:16
相关论文
共 50 条
  • [1] A Massively Parallel Bayesian Approach to Factorization-Based Analysis of Big Time Series Data
    Gao, Tengfei
    Liu, Yongyan
    Tang, Yunbo
    Zhang, Lei
    Chen, Dan
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (07): : 1567 - 1577
  • [2] Compressing Sampling for Time Series Big Data
    Miao Bei-bei
    Jin Xue-bo
    [J]. 2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 4957 - 4961
  • [3] Data Approximation for Time Series Data in Wireless Sensor Networks
    Xu, Xiaobin
    [J]. INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2016, 12 (03) : 1 - 13
  • [4] Real Time Interpretation and Optimization of Time Series Data Stream in Big Data
    Jiang, Zheyuan
    Liu, Ke
    [J]. 2018 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2018, : 243 - 247
  • [5] An incremental approach for real-time Big Data visual analytics
    Garcia, Ignacio
    Casado, Ruben
    Bouchachia, Abdelhamid
    [J]. 2016 IEEE 4TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD WORKSHOPS (FICLOUDW), 2016, : 177 - 182
  • [6] Finformer: Fast Incremental and General Time Series Data Prediction
    Bou, Savong
    Amagasa, Toshiyuki
    Kitagawa, Hiroyuki
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (05) : 625 - 637
  • [7] InTrans: Fast Incremental Transformer for Time Series Data Prediction
    Bou, Savong
    Amagasa, Toshiyuki
    Kitagawa, Hiroyuki
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2022, PT II, 2022, 13427 : 47 - 61
  • [8] The Ensemble of Unsupervised Incremental Learning Algorithm for Time Series Data
    Beulah, D.
    Raj, P. Vamsi Krishna
    [J]. International Journal of Engineering, Transactions B: Applications, 2022, 35 (02): : 319 - 326
  • [9] Mining and Forecasting of Big Time-series Data
    Sakurai, Yasushi
    Matsubara, Yasuko
    Faloutsos, Christos
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 919 - 922
  • [10] Efficient Geospatial Analytics on Time Series Big Data
    Al Jawameh, Isam Mashhour
    Bellavista, Paolo
    Corradi, Antonio
    Foschini, Luca
    Montanan, Rebecca
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 3002 - 3008