Greedy Gaussian segmentation of multivariate time series

被引:51
|
作者
Hallac, David [1 ]
Nystrup, Peter [2 ]
Boyd, Stephen [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Tech Univ Denmark, Lyngby, Denmark
关键词
Time series analysis; Change-point detection; Financial regimes; Text segmentation; Covariance regularization; Greedy algorithms; DETECTING CHANGE-POINTS; HIDDEN MARKOV-MODELS; SPARSITY;
D O I
10.1007/s11634-018-0335-0
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of breaking a multivariate (vector) time series into segments over which the data is well explained as independent samples from a Gaussian distribution. We formulate this as a covariance-regularized maximum likelihood problem, which can be reduced to a combinatorial optimization problem of searching over the possible breakpoints, or segment boundaries. This problem can be solved using dynamic programming, with complexity that grows with the square of the time series length. We propose a heuristic method that approximately solves the problem in linear time with respect to this length, and always yields a locally optimal choice, in the sense that no change of any one breakpoint improves the objective. Our method, which we call greedy Gaussian segmentation (GGS), easily scales to problems with vectors of dimension over 1000 and time series of arbitrary length. We discuss methods that can be used to validate such a model using data, and also to automatically choose appropriate values of the two hyperparameters in the method. Finally, we illustrate our GGS approach on financial time series and Wikipedia text data.
引用
收藏
页码:727 / 751
页数:25
相关论文
共 50 条
  • [31] Statistical Causality for Multivariate Nonlinear Time Series via Gaussian Process Models
    Zaremba, Anna B.
    Peters, Gareth W.
    [J]. METHODOLOGY AND COMPUTING IN APPLIED PROBABILITY, 2022, 24 (04) : 2587 - 2632
  • [32] Statistical Causality for Multivariate Nonlinear Time Series via Gaussian Process Models
    Anna B. Zaremba
    Gareth W. Peters
    [J]. Methodology and Computing in Applied Probability, 2022, 24 : 2587 - 2632
  • [33] Prediction of Multivariate Time Series with Sparse Gaussian Process Echo State Network
    Han, Min
    Ren, Weijie
    Xu, Meiling
    [J]. PROCEEDINGS OF THE 2013 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2013, : 510 - 513
  • [34] Time-Varying Gaussian Markov Random Fields Learning for Multivariate Time Series Clustering
    Ding, Wangxiang
    Li, Wenzhong
    Zhang, Zhijie
    Wan, Chen
    Duan, Jianhui
    Lu, Sanglu
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (11) : 11950 - 11966
  • [35] SEGMENTATION OF PLANKTONIC MULTIVARIATE CHRONOLOGICAL SERIES
    IBANEZ, F
    [J]. OCEANOLOGICA ACTA, 1984, 7 (04) : 481 - 491
  • [36] An L0-Norm Regularized Method for Multivariate Time Series Segmentation
    Li, Min
    Huang, Yu-Mei
    [J]. EAST ASIAN JOURNAL ON APPLIED MATHEMATICS, 2022, 12 (02) : 353 - 366
  • [37] A dynamic customer segmentation approach by combining LRFMS and multivariate time series clustering
    Wang, Shuhai
    Sun, Linfu
    Yu, Yang
    [J]. SCIENTIFIC REPORTS, 2024, 14 (01):
  • [38] A hybrid segmentation method for multivariate time series based on the dynamic factor model
    Zhubin Sun
    Xiaodong Liu
    Lizhu Wang
    [J]. Stochastic Environmental Research and Risk Assessment, 2017, 31 : 1291 - 1304
  • [39] Batch Process Monitoring Based on Fuzzy Segmentation of Multivariate Time-Series
    Tanatavikorn, Harakhun
    Yamashita, Yoshiyuki
    [J]. JOURNAL OF CHEMICAL ENGINEERING OF JAPAN, 2017, 50 (01) : 53 - 63
  • [40] Fuzzy segmentation of multivariate time series with KPCA and G-G clustering
    Wang L.
    Zhu H.
    [J]. Kongzhi yu Juece/Control and Decision, 2021, 36 (01): : 115 - 124