PISA: an Index for Aggregating Big Time Series Data

被引:4
|
作者
Huang, Xiangdong [1 ]
Wang, Jianmin [1 ,2 ]
Wong, Raymond K. [3 ]
Zhang, Jinrui [1 ]
Wang, Chen [1 ]
机构
[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China
[2] TNList, Inst Data Sci, Beijing 100084, Peoples R China
[3] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia
关键词
temporal data; aggregation index; COMPUTATION;
D O I
10.1145/2983323.2983775
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Aggregation operation plays an important role in time series database management. As the amount of data increases, current solutions such as summary table and MapReduce-based methods struggle to respond to such queries with low latency. Other approaches such as segment tree based methods have a poor insertion performance when the data size exceeds the available memory. This paper proposes a new segment tree based index called PISA, which has fast insertion performance and low latency for aggregation queries. PISA uses a forest to overcome the performance disadvantages of insertions in traditional segment trees. By de fining two kinds of tags, namely code number and serial number, we propose an algorithm to accelerate queries by avoiding reading unnecessary data on disk. The index is stored on disk and only takes a few hundred bytes of memory for billions of data points. PISA can be easily implemented on both traditional databases and NoSQL systems, examples including MySQL and Cassandra. It handles aggregation queries within milliseconds on a commodity server for a time range that may contain tens of billions of data points.
引用
收藏
页码:979 / 988
页数:10
相关论文
共 50 条
  • [1] Dual-PISA: An index for aggregation operations on time series data
    Qiao, Jialin
    Huang, Xiangdong
    Wang, Jianmin
    Wong, Raymond K.
    [J]. Information Systems, 2020, 87
  • [2] Dual-PISA: An index for aggregation operations on time series data
    Qiao, Jialin
    Huang, Xiangdong
    Wang, Jianmin
    Wong, Raymond K.
    [J]. INFORMATION SYSTEMS, 2020, 87
  • [3] Urban and regional distinctions for aggregating time series data
    Cutler, Harvey
    England, Scott
    Weiler, Stephan
    [J]. PAPERS IN REGIONAL SCIENCE, 2007, 86 (04) : 575 - 595
  • [4] Skyline index for time series data
    Li, QZ
    López, IFV
    Moon, B
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (06) : 669 - 684
  • [5] Compressing Sampling for Time Series Big Data
    Miao Bei-bei
    Jin Xue-bo
    [J]. 2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 4957 - 4961
  • [6] Real Time Interpretation and Optimization of Time Series Data Stream in Big Data
    Jiang, Zheyuan
    Liu, Ke
    [J]. 2018 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2018, : 243 - 247
  • [7] Ensemble-Based Tracking: Aggregating Crowdsourced Structured Time Series Data
    Wang, Naiyan
    Yeung, Dit-Yan
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1107 - 1115
  • [8] Mining and Forecasting of Big Time-series Data
    Sakurai, Yasushi
    Matsubara, Yasuko
    Faloutsos, Christos
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 919 - 922
  • [9] Efficient Geospatial Analytics on Time Series Big Data
    Al Jawameh, Isam Mashhour
    Bellavista, Paolo
    Corradi, Antonio
    Foschini, Luca
    Montanan, Rebecca
    [J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 3002 - 3008
  • [10] Time-Series Big Data Stream Evaluation
    Mursanto, Petrus
    Wibisono, Ari
    Bayu, Wendy D. W. T.
    Ahli, Valian Fil
    Rizki, May Iffah
    Hasani, Lintang Matahari
    Adibah, Jihan
    [J]. 2020 5TH INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2020), 2020, : 43 - 47