Mining lake time series using symbolic representation

被引:6
|
作者
Ruan, Guangchen [1 ]
Hanson, Paul C. [2 ]
Dugan, Hilary A. [2 ]
Plale, Beth [1 ]
机构
[1] Indiana Univ, Sch Informat & Comp, 919 E 10th St, Bloomington, IN 47408 USA
[2] Univ Wisconsin, Ctr Limnol, 680 North Pk St, Madison, WI 53706 USA
基金
美国国家科学基金会;
关键词
Lake time series; Symbolic representation; Mining; EVOLUTION; MODEL;
D O I
10.1016/j.ecoinf.2017.03.001
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Sensor networks deployed in lakes and reservoirs, when combined with simulation models and expert knowledge from the global community, are creating deeper understanding of the ecological dynamics of lakes. However, the amount of data and the complex patterns in the data demand substantial compute resources and efficient data mining algorithms, both of which are beyond the realm of traditional limnological research. This paper uniquely adapts methods from computer science for application to data intensive ecological questions, in order to provide ecologists with approachable methodology to facilitate knowledge discovery in lake ecology. We apply a state-of-the-art time series mining technique based on symbolic representation (SAX) to high-frequency time series of phycocyanin (PHYCO) and chlorophyll (CHLORO) fluorescence, both of which are indicators of algal biomass in lakes, as well as model predictions of algal biomass (MODEL). We use data mining techniques to demonstrate that MODEL predicts PHYCO better than it predicts CHLORO. All time series have high redundancy, resulting in a relatively small subset of unique patterns. However, MODEL is much less complex than either PHYCO or CHLORO and fails to reproduce high biomass periods indicative of algal blooms. We develop a set of tools in R to enable motif discovery and anomaly detection within a single lake time series, and relationship study among multiple lake time series through distance metrics, clustering and classification. Furthermore, to improve computation times, we provision web services to launch R tools remotely on high performance computing (HPC) resources. Comprehensive experimental results on observational and simulated lake data demonstrate the effectiveness of our approach. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:10 / 22
页数:13
相关论文
共 50 条
  • [21] A non-parametric symbolic approximate representation for long time series
    Xiaoxu He
    Chenxi Shao
    Yan Xiong
    Pattern Analysis and Applications, 2016, 19 : 111 - 127
  • [22] A Time-Series Representation for Temporal Web Mining Using a Data Band Approach
    Samia, Mireille
    Conrad, Stefan
    DATABASES AND INFORMATION SYSTEMS IV, 2007, 155 : 161 - 174
  • [23] Trend-based Symbolic Aggregate Approximation for Time Series Representation
    Zhang, Ke
    Li, Yuan
    Chai, Yi
    Huang, Lei
    PROCEEDINGS OF THE 30TH CHINESE CONTROL AND DECISION CONFERENCE (2018 CCDC), 2018, : 2234 - 2240
  • [24] Analysing Time Series using Symbolic Representations
    Monetti, R.
    Bunk, W.
    Jamitzky, F.
    TOPICS ON CHAOTIC SYSTEMS, 2009, : 242 - 250
  • [25] 1d-SAX: A Novel Symbolic Representation for Time Series
    Malinowski, Simon
    Guyet, Thomas
    Quiniou, Rene
    Tavenard, Romain
    ADVANCES IN INTELLIGENT DATA ANALYSIS XII, 2013, 8207 : 273 - 284
  • [26] Symbolic Representation Based on Temporal Order Information for Time Series Classification
    Zalewski, Willian
    Silva, Fabiano
    Maletzke, Andre Gustavo
    Wu, Feng Chung
    Lee, Huei Diana
    2013 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2013, : 95 - 100
  • [27] A non-parametric symbolic approximate representation for long time series
    He, Xiaoxu
    Shao, Chenxi
    Xiong, Yan
    PATTERN ANALYSIS AND APPLICATIONS, 2016, 19 (01) : 111 - 127
  • [28] SAXO : An Optimized Data-driven Symbolic Representation of Time Series
    Bondu, A.
    Boulle, M.
    Grossin, B.
    2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
  • [29] Harmony Search Algorithm for Optimal Word Size in Symbolic Time Series Representation
    Ahmed, Almahdi Mohammed
    Abu Bakar, Azuraliza
    Hamdan, Abdul Razak
    2011 3RD CONFERENCE ON DATA MINING AND OPTIMIZATION (DMO), 2011, : 57 - 62
  • [30] A fast sorting-based aggregation method for symbolic time series representation
    Chen, Xinye
    Guttel, Stefan
    21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS ICDMW 2021, 2021, : 1009 - 1016