catch22: CAnonical Time-series CHaracteristicsSelected through highly comparative time-series analysis

被引:0
|
作者
Carl H. Lubba
Sarab S. Sethi
Philip Knaute
Simon R. Schultz
Ben D. Fulcher
Nick S. Jones
机构
[1] Imperial College London,Department of Bioengineering
[2] Imperial College London,Department of Mathematics
[3] The University of Sydney,School of Physics, Faculty of Science
来源
关键词
Time series; Classification; Clustering; Time-series features;
D O I
暂无
中图分类号
学科分类号
摘要
Capturing the dynamical properties of time series concisely as interpretable feature vectors can enable efficient clustering and classification for time-series applications across science and industry. Selecting an appropriate feature-based representation of time series for a given application can be achieved through systematic comparison across a comprehensive time-series feature library, such as those in the hctsa toolbox. However, this approach is computationally expensive and involves evaluating many similar features, limiting the widespread adoption of feature-based representations of time series for real-world applications. In this work, we introduce a method to infer small sets of time-series features that (i) exhibit strong classification performance across a given collection of time-series problems, and (ii) are minimally redundant. Applying our method to a set of 93 time-series classification datasets (containing over 147,000 time series) and using a filtered version of the hctsa feature library (4791 features), we introduce a set of 22 CAnonical Time-series CHaracteristics, catch22, tailored to the dynamics typically encountered in time-series data-mining tasks. This dimensionality reduction, from 4791 to 22, is associated with an approximately 1000-fold reduction in computation time and near linear scaling with time-series length, despite an average reduction in classification accuracy of just 7%. catch22 captures a diverse and interpretable signature of time series in terms of their properties, including linear and non-linear autocorrelation, successive differences, value distributions and outliers, and fluctuation scaling properties. We provide an efficient implementation of catch22, accessible from many programming environments, that facilitates feature-based time-series analysis for scientific, industrial, financial and medical applications using a common language of interpretable time-series properties.
引用
收藏
页码:1821 / 1852
页数:31
相关论文
共 50 条
  • [41] Time-series forecasting
    Nikolopoulos, K
    [J]. INTERNATIONAL JOURNAL OF FORECASTING, 2003, 19 (04) : 754 - 755
  • [42] EYEBALLING TIME-SERIES
    UNWIN, A
    WILLS, G
    [J]. AMERICAN STATISTICAL ASSOCIATION 1988 PROCEEDINGS OF THE STATISTICAL COMPUTING SECTION, 1988, : 263 - 268
  • [43] THE TIME-SERIES LIBRARY
    PHENIX, K
    [J]. WILSON LIBRARY BULLETIN, 1993, 68 (04) : 106 - 107
  • [44] Time-series forecasting
    Marett, R
    [J]. JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2003, 54 (10) : 1125 - 1126
  • [45] ASYMMETRIC TIME-SERIES
    WECKER, WE
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1981, 76 (373) : 16 - 21
  • [46] TEMPERATURE TIME-SERIES
    ALLEN, MR
    READ, PL
    SMITH, LA
    [J]. NATURE, 1992, 355 (6362) : 686 - 686
  • [47] TIME-SERIES SIMULATION
    DUNNE, A
    [J]. STATISTICIAN, 1992, 41 (01): : 3 - 8
  • [48] Spectral analysis of time-series data
    Gregson, RAM
    [J]. CONTEMPORARY PSYCHOLOGY-APA REVIEW OF BOOKS, 1999, 44 (04): : 306 - 309
  • [49] A Primer for Microbiome Time-Series Analysis
    Coenen, Ashley R.
    Hu, Sarah K.
    Luo, Elaine
    Muratore, Daniel
    Weitz, Joshua S.
    [J]. FRONTIERS IN GENETICS, 2020, 11
  • [50] TIME-SERIES ANALYSIS FOR AMBIENT CONCENTRATIONS
    GONZALEZMANTEIGA, W
    PRADASANCHEZ, JM
    CAO, R
    GARCIAJURADO, I
    FEBREROBANDE, M
    LUCASDOMINGUEZ, T
    [J]. ATMOSPHERIC ENVIRONMENT PART A-GENERAL TOPICS, 1993, 27 (02): : 153 - 158