Leveraging change point detection to discover natural experiments in data

被引:0
|
作者
He, Yuzi [1 ,2 ]
Burghardt, Keith A. [1 ]
Lerman, Kristina [1 ]
机构
[1] Univ Southern Calif, Informat Sci Inst, Marina Del Rey, CA 90292 USA
[2] Univ Southern Calif, Dept Phys & Astron, Los Angeles, CA USA
关键词
Change point detection; High-dimensional data; Regression discontinuity design; Causal effect; REGRESSION DISCONTINUITY DESIGNS; CAUSAL-INFERENCE;
D O I
10.1140/epjds/s13688-022-00361-7
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Change point detection has many practical applications, from anomaly detection in data to scene changes in robotics; however, finding changes in high dimensional data is an ongoing challenge. We describe a self-training model-agnostic framework to detect changes in arbitrarily complex data. The method consists of two steps. First, it labels data as before or after a candidate change point and trains a classifier to predict these labels. The accuracy of this classifier varies for different candidate change points. By modeling the accuracy change we can infer the true change point and fraction of data affected by the change (a proxy for detection confidence). We demonstrate how our framework can achieve low bias over a wide range of conditions and detect changes in high dimensional, noisy data more accurately than alternative methods. We use the framework to identify changes in real-world data and measure their effects using regression discontinuity designs, thereby uncovering potential natural experiments, such as the effect of pandemic lockdowns on air pollution and the effect of policy changes on performance and persistence in a learning platform. Our method opens new avenues for data-driven discovery due to its flexibility, accuracy and robustness in identifying changes in data.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Leveraging change point detection to discover natural experiments in data
    Yuzi He
    Keith A. Burghardt
    Kristina Lerman
    [J]. EPJ Data Science, 11
  • [2] Change point detection in text data
    Preis A.
    Schwaar S.
    [J]. Behaviormetrika, 2024, 51 (1) : 477 - 496
  • [3] Leveraging Longitudinal Data for Cardiomegaly and Change Detection in Chest Radiography
    Belo, Raquel
    Rocha, Joana
    Pedrosa, Joao
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2023, PT I, 2024, 14469 : 434 - 448
  • [4] Leveraging Linked Data to Discover Semantic Relations Within Data Sources
    Taheriyan, Mohsen
    Knoblock, Craig A.
    Szekely, Pedro
    Ambite, Jose Luis
    [J]. SEMANTIC WEB - ISWC 2016, PT I, 2016, 9981 : 549 - 565
  • [5] Multiscale change point detection for dependent data
    Dette, Holger
    Eckle, Theresa
    Vetter, Mathias
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2020, 47 (04) : 1243 - 1274
  • [6] Predictive change point detection for heterogeneous data
    Anna-Christina Glock
    Florian Sobieczky
    Johannes Fürnkranz
    Peter Filzmoser
    Martin Jech
    [J]. Neural Computing and Applications, 2024, 36 (26) : 16071 - 16096
  • [7] Bayesian change point detection for functional data
    Li, Xiuqi
    Ghosal, Subhashis
    [J]. JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2021, 213 : 193 - 205
  • [8] Change point detection for clustered expression data
    Miriam Sieg
    Lina Katrin Sciesielski
    Karin Michaela Kirschner
    Jochen Kruppa
    [J]. BMC Genomics, 23
  • [9] Change-point detection in angular data
    Grabovsky, I
    Horváth, L
    [J]. ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2001, 53 (03) : 552 - 566
  • [10] Change-point detection in panel data
    Horvath, Lajos
    Huskova, Marie
    [J]. JOURNAL OF TIME SERIES ANALYSIS, 2012, 33 (04) : 631 - 648