Join Query Processing in Data Quality Management

被引:3
|
作者
Yue, Mingliang [1 ]
Gao, Hong [1 ]
Shi, Shengfei [1 ]
Wang, Hongzhi [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
关键词
Data quality management; MapReduce; Bloom filter; Join;
D O I
10.1007/978-3-319-32055-7_27
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data quality management is the essential problem for information systems. As a basic operation of Data quality management, joins on large-scale data play an important role in document clustering. MapReduce is a programming model which is usually applied to process large-scale data. Many tasks can be implemented under the framework, such as data processing of search engines and machine learning. However, there is no efficient support for join operation in current implementations of MapReduce. In this paper, we present a strategies to build the extend bloom filter for the large dataset using MapReduce. We use the extend bloom filter to improve the performance of two-way and multi-way joins.
引用
收藏
页码:329 / 342
页数:14
相关论文
共 50 条
  • [21] Skyline Join Query Processing over Multiple Relations
    Zhang, Jinchao
    Lin, Zheng
    Li, Bo
    Wang, Weiping
    Meng, Dan
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2016, 2016, 9645 : 353 - 361
  • [22] Skyline-join query processing in distributed databases
    Mei BAI
    Junchang XIN
    Guoren WANG
    Roger ZIMMERMANN
    Xite WANG
    [J]. Frontiers of Computer Science, 2016, 10 (02) : 330 - 352
  • [23] Skyline-join query processing in distributed databases
    Mei Bai
    Junchang Xin
    Guoren Wang
    Roger Zimmermann
    Xite Wang
    [J]. Frontiers of Computer Science, 2016, 10 : 330 - 352
  • [24] An efficient progressive spatial Join query processing algorithm
    Tang, Gui-Fen
    Yang, Wei-Feng
    Huang, Shuang-Lin
    Li, Wei
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2009, 37 (02): : 318 - 324
  • [25] Adaptive Multi-Join Query Processing in PDBMS
    Wu, Sai
    Vu, Quang Hieu
    Li, Hanzhong
    Tan, Kian-Lee
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 1239 - +
  • [26] SIMULATION OF JOIN QUERY-PROCESSING ALGORITHMS FOR A TRUSTED DISTRIBUTED DATABASE-MANAGEMENT SYSTEM
    RUBINOVITZ, H
    THURAISINGHAM, B
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 1993, 35 (05) : 287 - 299
  • [27] Adaptive join query processing in data grids: Exploring relation partial replicas and load balancing
    Yang, Donghua
    Li, Jianzhong
    Gao, Hong
    [J]. ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 1036 - +
  • [28] Efficient query processing framework for big data warehouse: an almost join-free approach
    Wang, Huiju
    Qin, Xiongpai
    Zhou, Xuan
    Li, Furong
    Qin, Zuoyan
    Zhu, Qing
    Wang, Shan
    [J]. FRONTIERS OF COMPUTER SCIENCE, 2015, 9 (02) : 224 - 236
  • [29] Efficient query processing framework for big data warehouse:an almost join-free approach
    Huiju WANG
    Xiongpai QIN
    Xuan ZHOU
    Furong LI
    Zuoyan QIN
    Qing ZHU
    Shan WANG
    [J]. Frontiers of Computer Science, 2015, 9 (02) : 224 - 236
  • [30] Efficient query processing framework for big data warehouse: an almost join-free approach
    Huiju Wang
    Xiongpai Qin
    Xuan Zhou
    Furong Li
    Zuoyan Qin
    Qing Zhu
    Shan Wang
    [J]. Frontiers of Computer Science, 2015, 9 : 224 - 236