Finding needles in a hay stream: On persistent item lookup in data streams

被引:4
|
作者
Chen, Lin [1 ]
Dai, Haipeng [2 ]
Meng, Lei [2 ]
Yu, Jihong [3 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China
[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing, Peoples R China
[3] Beijing Inst Technol, Sch Informat & Elect, Beijing, Peoples R China
关键词
Persistent item lookup; Data stream mining; SKETCH;
D O I
10.1016/j.comnet.2020.107518
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In a data stream composed of an ordered sequence of data items, persistent items refer to those persisting to occur over a long timespan. Compared with ordinary items, persistent ones, though not necessarily occurring more frequently, typically convey more valuable information. Persistent item lookup, the functionality to identify all persistent items, emerges as a pivotal building block in many computing and network systems. In this paper, we devise a generic persistent item lookup algorithm supporting high-speed, high-accuracy lookup with limited memory cost. The key technicalities we propose in our design are two-fold. First, our algorithm attempts to record only persistent items seen so far based on the currently available information about the stream, thus significantly reducing memory overhead, especially for real-life highly skewed data streams. Second, our algorithm balances the recording load in both time and space domains: in the time domain, we partition persistent items into approximately equal-size subsets and record only one subset in each epoch; in the space domain, we apply the state-of-the-art load balancing technique to evenly distribute recorded items across the on-die memory. By holistically integrating these components, we iron out a persistent item lookup algorithm outperforming existing solutions in a wide range of practical settings.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Finding the hottest item in data streams
    Lin, Huaizhong
    Wu, Shanshan
    Hou, Leong U.
    Kou, Ngai Meng
    Gao, Yunjun
    Lu, Dongming
    INFORMATION SCIENCES, 2018, 430 : 314 - 330
  • [2] Finding Persistent Items in Data Streams
    Dai, Haipeng
    Shahzad, Muhammad
    Liu, Alex X.
    Zhong, Yuankun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2016, 10 (04): : 289 - 300
  • [3] Finding a duplicate and a missing item in a stream
    Tarui, Jun
    Theory and Applications of Models of Computation, Proceedings, 2007, 4484 : 128 - 135
  • [4] Finding Persistent Items using Invertible Bloom Lookup Table
    Lv, Zhoudan
    He, Feng
    Chen, Lin
    PROCEEDING OF THE 2019 INTERNATIONAL CONFERENCE ON COMPUTER, INFORMATION AND TELECOMMUNICATION SYSTEMS (IEEE CITS 2019), 2019, : 101 - 105
  • [5] P-Sketch: A Fast and Accurate Sketch for Persistent Item Lookup
    Li, Weihe
    Patras, Paul
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2024, 32 (02) : 987 - 1002
  • [6] Finding the Right Needles in Hay Helping Program Comprehension of Large Software Systems
    Sora, Ioana
    ENASE 2015 - PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING, 2015, : 129 - 140
  • [7] Finding duplicates in a data stream
    Gopalan, Parikshit
    Radhakrishnan, Jaikumar
    PROCEEDINGS OF THE TWENTIETH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2009, : 402 - +
  • [8] Filtering Log Data: Finding the Needles in the Haystack
    Yu, Li
    Zheng, Ziming
    Lan, Zhiling
    Jones, Terry
    Brandt, Jim M.
    Gentile, Ann C.
    2012 42ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN), 2012,
  • [9] Altmetrics: Finding Meaningful Needles in the Data Haystack
    Crotty, David
    SERIALS REVIEW, 2014, 40 (03) : 141 - 146
  • [10] Stream operators for querying data streams
    Ma, LS
    Viglas, SD
    Li, M
    Li, Q
    ADVANCES IN WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2005, 3739 : 404 - 415