Mining Robust Frequent Items in Data Streams

被引:0
|
作者
Xia, Rui [1 ]
Dai, Haipeng [1 ]
Du, Zhanchao [2 ]
Li, Meng [1 ]
Liu, Alex X. [1 ]
Chen, Guihai [1 ]
机构
[1] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] China Acad Space Technol, Inst Manned Space Syst Engn, Beijing 100094, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/JCC49151.2020.00026
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies the problem of robust frequent items mining in data streams that generalizes the traditional frequent items mining by considering the noise of datasets. That is, different items may correspond to the same entity because of noise; examples include different images of the same object and fluctuated data in the same setting measured by sensors. Our objective is to identify those items that correspond to the same entity and have an aggregated frequency exceeding a given threshold, which are named as robust frequent items. To the best of our knowledge, there is no existing works on mining robust frequent items in a data stream. In this paper, we first propose a scheme by applying sampling and spatial partition to address the problem in low dimensional spaces. Furthermore, we extend the above algorithmic framework to high dimensional spaces by incorporating the locality sensitive hashing scheme to deal with the approximate nearest neighbor problem. We conduct evaluations using synthetic datasets and compare our scheme with two prior adapted schemes. Our results demonstrate that the efficiency of our algorithms outperforms the adaptive Space Saving by 14.8% and 9.8% on average in terms of precision and recall, respectively.
引用
收藏
页码:110 / 117
页数:8
相关论文
共 50 条
  • [1] Methods for mining frequent items in data streams: an overview
    Hongyan Liu
    Yuan Lin
    Jiawei Han
    [J]. Knowledge and Information Systems, 2011, 26 : 1 - 30
  • [2] Methods for mining frequent items in data streams: an overview
    Liu, Hongyan
    Lin, Yuan
    Han, Jiawei
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 26 (01) : 1 - 30
  • [3] A Mining Algorithm of Frequent Items in Data Streams Based on Apache Storm
    Hu, Weihua
    Guo, Ziang
    Chen, Mingzhong
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 2926 - 2930
  • [4] Finding frequent items in data streams
    Charikar, M
    Chen, K
    Farach-Colton, M
    [J]. THEORETICAL COMPUTER SCIENCE, 2004, 312 (01) : 3 - 15
  • [5] Finding the Frequent Items in Streams of Data
    Cormode, Graham
    Hadjieleftheriou, Marios
    [J]. COMMUNICATIONS OF THE ACM, 2009, 52 (10) : 97 - 105
  • [6] Finding frequent items in data streams
    Charikar, M
    Chen, K
    Farach-Colton, M
    [J]. AUTOMATA, LANGUAGES AND PROGRAMMING, 2002, 2380 : 693 - 703
  • [7] Finding Frequent Items in Data Streams
    Cormode, Graham
    Hadjieleftheriou, Marios
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2008, 1 (02): : 1530 - 1541
  • [8] False-negative frequent items mining from data streams with bursting
    Chong, ZH
    Yu, JX
    Lu, HJ
    Zhang, ZJ
    Zhou, AY
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 422 - 434
  • [9] Methods for finding frequent items in data streams
    Graham Cormode
    Marios Hadjieleftheriou
    [J]. The VLDB Journal, 2010, 19 : 3 - 20
  • [10] Finding hierarchical frequent items in data streams
    Feng, Wenfeng
    Guo, Qiao
    Zhang, Zhibin
    [J]. WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 5972 - +