PISA: an Index for Aggregating Big Time Series Data

被引:4
|
作者
Huang, Xiangdong [1 ]
Wang, Jianmin [1 ,2 ]
Wong, Raymond K. [3 ]
Zhang, Jinrui [1 ]
Wang, Chen [1 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
[2] TNList, Inst Data Sci, Beijing 100084, Peoples R China
[3] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
关键词
temporal data; aggregation index; COMPUTATION;
D O I
10.1145/2983323.2983775
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Aggregation operation plays an important role in time series database management. As the amount of data increases, current solutions such as summary table and MapReduce-based methods struggle to respond to such queries with low latency. Other approaches such as segment tree based methods have a poor insertion performance when the data size exceeds the available memory. This paper proposes a new segment tree based index called PISA, which has fast insertion performance and low latency for aggregation queries. PISA uses a forest to overcome the performance disadvantages of insertions in traditional segment trees. By de fining two kinds of tags, namely code number and serial number, we propose an algorithm to accelerate queries by avoiding reading unnecessary data on disk. The index is stored on disk and only takes a few hundred bytes of memory for billions of data points. PISA can be easily implemented on both traditional databases and NoSQL systems, examples including MySQL and Cassandra. It handles aggregation queries within milliseconds on a commodity server for a time range that may contain tens of billions of data points.
引用
收藏
页码:979 / 988
页数:10
相关论文
共 50 条
  • [41] ChainLink: Indexing Big Time Series Data For Long Subsequence Matching
    Alghamdi, Noura
    Zhang, Liang
    Zhang, Huayi
    Rundensteiner, Elke A.
    Eltabakh, Mohamed Y.
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 529 - 540
  • [42] Finding Electric Energy Consumption Patterns in Big Time Series Data
    Perez-Chacon, R.
    Talavera-Llames, R. L.
    Martinez-Alvarez, F.
    Troncoso, A.
    [J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, (DCAI 2016), 2016, 474 : 231 - 238
  • [43] Compositional time series analysis for Air Pollution Index data
    AL-Dhurafi, Nasr Ahmed
    Masseran, Nurulkamal
    Zamzuri, Zamira Hasanah
    [J]. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2018, 32 (10) : 2903 - 2911
  • [44] M-tree as an Index Structure for Time Series Data
    Huynh Huu Viet
    Duong Tuan Anh
    [J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTING, MANAGEMENT AND TELECOMMUNICATIONS (COMMANTEL), 2013, : 146 - 151
  • [45] Compositional time series analysis for Air Pollution Index data
    Nasr Ahmed AL-Dhurafi
    Nurulkamal Masseran
    Zamira Hasanah Zamzuri
    [J]. Stochastic Environmental Research and Risk Assessment, 2018, 32 : 2903 - 2911
  • [46] Modeling time series by aggregating multiple fuzzy cognitive maps
    Yu, Tianming
    Gan, Qunfeng
    Feng, Guoliang
    [J]. PEERJ COMPUTER SCIENCE, 2021, 7
  • [47] Towards stream data parallel processing in spatial aggregating index
    Gorawski, Marcin
    Malczok, Rafal
    [J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 209 - 218
  • [48] Modeling time series by aggregating multiple fuzzy cognitive maps
    Yu, Tianming
    Gan, Qunfeng
    Feng, Guoliang
    [J]. PeerJ Computer Science, 2021, 7 : 1 - 20
  • [49] Hierarchical Dynamic Time Warping methodology for aggregating multiple geological time series
    Burstyn, Yuval
    Gazit, Asaf
    Dvir, Omri
    [J]. COMPUTERS & GEOSCIENCES, 2021, 150
  • [50] A big data framework for stock price forecasting using fuzzy time series
    Wang, Weina
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (08) : 10123 - 10134