PISA: an Index for Aggregating Big Time Series Data

被引：4

作者：

Huang, Xiangdong ^{[1
]}

Wang, Jianmin ^{[1
,2
]}

Wong, Raymond K. ^{[3
]}

Zhang, Jinrui ^{[1
]}

Wang, Chen ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Software, Beijing 100084, Peoples R China

[2] TNList, Inst Data Sci, Beijing 100084, Peoples R China

[3] Univ New South Wales, Sch Comp Sci & Engn, Sydney, NSW, Australia

来源：

CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT | 2016年

关键词：

temporal data; aggregation index; COMPUTATION;

D O I：

10.1145/2983323.2983775

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Aggregation operation plays an important role in time series database management. As the amount of data increases, current solutions such as summary table and MapReduce-based methods struggle to respond to such queries with low latency. Other approaches such as segment tree based methods have a poor insertion performance when the data size exceeds the available memory. This paper proposes a new segment tree based index called PISA, which has fast insertion performance and low latency for aggregation queries. PISA uses a forest to overcome the performance disadvantages of insertions in traditional segment trees. By de fining two kinds of tags, namely code number and serial number, we propose an algorithm to accelerate queries by avoiding reading unnecessary data on disk. The index is stored on disk and only takes a few hundred bytes of memory for billions of data points. PISA can be easily implemented on both traditional databases and NoSQL systems, examples including MySQL and Cassandra. It handles aggregation queries within milliseconds on a commodity server for a time range that may contain tens of billions of data points.

引用

页码：979 / 988

页数：10

共 50 条

[41] ChainLink: Indexing Big Time Series Data For Long Subsequence Matching
Alghamdi, Noura
Zhang, Liang
Zhang, Huayi
Rundensteiner, Elke A.
Eltabakh, Mohamed Y.
[J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 529 - 540
[42] Finding Electric Energy Consumption Patterns in Big Time Series Data
Perez-Chacon, R.
Talavera-Llames, R. L.
Martinez-Alvarez, F.
Troncoso, A.
[J]. DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE, (DCAI 2016), 2016, 474 : 231 - 238
[43] Compositional time series analysis for Air Pollution Index data
AL-Dhurafi, Nasr Ahmed
Masseran, Nurulkamal
Zamzuri, Zamira Hasanah
[J]. STOCHASTIC ENVIRONMENTAL RESEARCH AND RISK ASSESSMENT, 2018, 32 (10) : 2903 - 2911
[44] M-tree as an Index Structure for Time Series Data
Huynh Huu Viet
Duong Tuan Anh
[J]. 2013 INTERNATIONAL CONFERENCE ON COMPUTING, MANAGEMENT AND TELECOMMUNICATIONS (COMMANTEL), 2013, : 146 - 151
[45] Compositional time series analysis for Air Pollution Index data
Nasr Ahmed AL-Dhurafi
Nurulkamal Masseran
Zamira Hasanah Zamzuri
[J]. Stochastic Environmental Research and Risk Assessment, 2018, 32 : 2903 - 2911
[46] Modeling time series by aggregating multiple fuzzy cognitive maps
Yu, Tianming
Gan, Qunfeng
Feng, Guoliang
[J]. PEERJ COMPUTER SCIENCE, 2021, 7
[47] Towards stream data parallel processing in spatial aggregating index
Gorawski, Marcin
Malczok, Rafal
[J]. PARALLEL PROCESSING AND APPLIED MATHEMATICS, 2008, 4967 : 209 - 218
[48] Modeling time series by aggregating multiple fuzzy cognitive maps
Yu, Tianming
Gan, Qunfeng
Feng, Guoliang
[J]. PeerJ Computer Science, 2021, 7 : 1 - 20
[49] Hierarchical Dynamic Time Warping methodology for aggregating multiple geological time series
Burstyn, Yuval
Gazit, Asaf
Dvir, Omri
[J]. COMPUTERS & GEOSCIENCES, 2021, 150
[50] A big data framework for stock price forecasting using fuzzy time series
Wang, Weina
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (08) : 10123 - 10134

← 1 2 3 4 5 →