On optimal multiple changepoint algorithms for large data

被引:103
|
作者
Maidstone, Robert [1 ]
Hocking, Toby [2 ]
Rigaill, Guillem [3 ]
Fearnhead, Paul [4 ]
机构
[1] Univ Lancaster, STOR i Ctr Doctoral Training, Lancaster LA1 4YW, England
[2] McGill Univ & Genome Quebec Innovat Ctr, Montreal, PQ, Canada
[3] Univ Paris Sud, Univ Devry, Univ Paris Diderot,Univ Paris Diderot,Sorbonne Pa, CNRS,INRA,UMR 9213 UMR 1403,Inst Plant Sci, Paris, France
[4] Univ Lancaster, Dept Math & Stat, Lancaster LA1 4YW, England
基金
英国工程与自然科学研究理事会;
关键词
Breakpoints; Dynamic Programming; FPOP; SNIP; Optimal Partitioning; pDPA; PELT; Segment Neighbourhood; DNA-SEQUENCE SEGMENTATION; CHANGE-POINTS; NUMBER; IDENTIFICATION; CRITERION; MODELS;
D O I
10.1007/s11222-016-9636-3
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Many common approaches to detecting change-points, for example based on statistical criteria such as penalised likelihood or minimum description length, can be formulated in terms of minimising a cost over segmentations. We focus on a class of dynamic programming algorithms that can solve the resulting minimisation problem exactly, and thus find the optimal segmentation under the given statistical criteria. The standard implementation of these dynamic programming methods have a computational cost that scales at least quadratically in the length of the time-series. Recently pruning ideas have been suggested that can speed up the dynamic programming algorithms, whilst still being guaranteed to be optimal, in that they find the true minimum of the cost function. Here we extend these pruning methods, and introduce two newalgorithms for segmenting data: FPOP and SNIP. Empirical results showthat FPOP is substantially faster than existing dynamic programming methods, and unlike the existing methods its computational efficiency is robust to the number of changepoints in the data. We evaluate the method for detecting copy number variations and observe that FPOP has a computational cost that is even competitive with that of binary segmentation, but can give much more accurate segmentations.
引用
收藏
页码:519 / 533
页数:15
相关论文
共 50 条
  • [1] On optimal multiple changepoint algorithms for large data
    Robert Maidstone
    Toby Hocking
    Guillem Rigaill
    Paul Fearnhead
    Statistics and Computing, 2017, 27 : 519 - 533
  • [2] Multiple Changepoint Detection via Genetic Algorithms
    Li, Shanghong
    Lund, Robert
    JOURNAL OF CLIMATE, 2012, 25 (02) : 674 - 686
  • [3] Multiple changepoint detection in categorical data streams
    Joshua Plasse
    Niall M. Adams
    Statistics and Computing, 2019, 29 : 1109 - 1125
  • [4] Multiple changepoint detection in categorical data streams
    Plasse, Joshua
    Adams, Niall M.
    STATISTICS AND COMPUTING, 2019, 29 (05) : 1109 - 1125
  • [5] Scalable multiple changepoint detection for functional data sequences
    Harris, Trevor
    Li, Bo
    Tucker, J. Derek
    ENVIRONMETRICS, 2022, 33 (02)
  • [6] Adaptive MCMC for multiple changepoint analysis with applications to large datasets
    Benson, Alan
    Friel, Nial
    ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 3365 - 3396
  • [7] Multiple changepoint detection with partial information on changepoint times
    Li, Yingbo
    Lund, Robert
    Hewaarachchi, Anuradha
    ELECTRONIC JOURNAL OF STATISTICS, 2019, 13 (02): : 2462 - 2520
  • [8] A comparison of single and multiple changepoint techniques for time series data
    Shi, Xuesheng
    Gallagher, Colin
    Lund, Robert
    Killick, Rebecca
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2022, 170
  • [9] Bayesian optimal design for changepoint problems
    Atherton, Juli
    Charbonneau, Benoit
    Wolfson, David B.
    Joseph, Lawrence
    Zhou, Xiaojie
    Vandal, Alain C.
    CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2009, 37 (04): : 495 - 513
  • [10] Optimal algorithms for trading large positions
    Pemy, Moustapha
    AUTOMATICA, 2012, 48 (07) : 1353 - 1358