PISA: an Index for Aggregating Big Time Series Data

被引：4

作者：

Huang, Xiangdong ^{[1
]}

Wang, Jianmin ^{[1
,2
]}

Wong, Raymond K. ^{[3
]}

Zhang, Jinrui ^{[1
]}

Wang, Chen ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China

[2] TNList, Inst Data Sci, Beijing 100084, Peoples R China

[3] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia

来源：

CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2016年

关键词：

temporal data; aggregation index; COMPUTATION;

D O I：

10.1145/2983323.2983775

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Aggregation operation plays an important role in time series database management. As the amount of data increases, current solutions such as summary table and MapReduce-based methods struggle to respond to such queries with low latency. Other approaches such as segment tree based methods have a poor insertion performance when the data size exceeds the available memory. This paper proposes a new segment tree based index called PISA, which has fast insertion performance and low latency for aggregation queries. PISA uses a forest to overcome the performance disadvantages of insertions in traditional segment trees. By de fining two kinds of tags, namely code number and serial number, we propose an algorithm to accelerate queries by avoiding reading unnecessary data on disk. The index is stored on disk and only takes a few hundred bytes of memory for billions of data points. PISA can be easily implemented on both traditional databases and NoSQL systems, examples including MySQL and Cassandra. It handles aggregation queries within milliseconds on a commodity server for a time range that may contain tens of billions of data points.

引用

页码：979 / 988

页数：10

共 50 条

[1] Dual-PISA: An index for aggregation operations on time series data
Qiao, Jialin
Huang, Xiangdong
Wang, Jianmin
Wong, Raymond K.
[J]. Information Systems, 2020, 87
[2] Dual-PISA: An index for aggregation operations on time series data
Qiao, Jialin
Huang, Xiangdong
Wang, Jianmin
Wong, Raymond K.
[J]. INFORMATION SYSTEMS, 2020, 87
[3] Urban and regional distinctions for aggregating time series data
Cutler, Harvey
England, Scott
Weiler, Stephan
[J]. PAPERS IN REGIONAL SCIENCE, 2007, 86 (04) : 575 - 595
[4] Skyline index for time series data
Li, QZ
López, IFV
Moon, B
[J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (06) : 669 - 684
[5] Compressing Sampling for Time Series Big Data
Miao Bei-bei
Jin Xue-bo
[J]. 2015 34TH CHINESE CONTROL CONFERENCE (CCC), 2015, : 4957 - 4961
[6] Real Time Interpretation and Optimization of Time Series Data Stream in Big Data
Jiang, Zheyuan
Liu, Ke
[J]. 2018 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA), 2018, : 243 - 247
[7] Ensemble-Based Tracking: Aggregating Crowdsourced Structured Time Series Data
Wang, Naiyan
Yeung, Dit-Yan
[J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 2), 2014, 32 : 1107 - 1115
[8] Mining and Forecasting of Big Time-series Data
Sakurai, Yasushi
Matsubara, Yasuko
Faloutsos, Christos
[J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 919 - 922
[9] Efficient Geospatial Analytics on Time Series Big Data
Al Jawameh, Isam Mashhour
Bellavista, Paolo
Corradi, Antonio
Foschini, Luca
Montanan, Rebecca
[J]. IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 3002 - 3008
[10] Time-Series Big Data Stream Evaluation
Mursanto, Petrus
Wibisono, Ari
Bayu, Wendy D. W. T.
Ahli, Valian Fil
Rizki, May Iffah
Hasani, Lintang Matahari
Adibah, Jihan
[J]. 2020 5TH INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS 2020), 2020, : 43 - 47

← 1 2 3 4 5 →