Join Query Processing in Data Quality Management

被引:3
|
作者
Yue, Mingliang [1 ]
Gao, Hong [1 ]
Shi, Shengfei [1 ]
Wang, Hongzhi [1 ]
机构
[1] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
关键词
Data quality management; MapReduce; Bloom filter; Join;
D O I
10.1007/978-3-319-32055-7_27
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data quality management is the essential problem for information systems. As a basic operation of Data quality management, joins on large-scale data play an important role in document clustering. MapReduce is a programming model which is usually applied to process large-scale data. Many tasks can be implemented under the framework, such as data processing of search engines and machine learning. However, there is no efficient support for join operation in current implementations of MapReduce. In this paper, we present a strategies to build the extend bloom filter for the large dataset using MapReduce. We use the extend bloom filter to improve the performance of two-way and multi-way joins.
引用
收藏
页码:329 / 342
页数:14
相关论文
共 50 条
  • [1] Efficient distance join query processing in distributed spatial data management systems
    Garcia-Garcia, Francisco
    Corral, Antonio
    Iribarne, Luis
    Vassilakopoulos, Michael
    Manolopoulos, Yannis
    [J]. INFORMATION SCIENCES, 2020, 512 : 985 - 1008
  • [2] Distributed Join Query Processing for Big RDF Data
    Elzein, Nahla Mohammed
    Majid, Mazlina Abdul
    Fakherldin, Mohammed
    Hashem, Ibrahim Abaker Targio
    [J]. ADVANCED SCIENCE LETTERS, 2018, 24 (10) : 7758 - 7761
  • [3] Distributed multi-join query processing in data grids
    Yang, Donghua
    Li, Hanzhong
    [J]. INFORMATION SCIENCES, 2007, 177 (17) : 3574 - 3591
  • [4] Storing Join Relationships for Fast Join Query Processing
    Hamdi, Mohammed
    Yu, Feng
    Alswedani, Sarah
    Hou, Wen-Chi
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2017, PT I, 2017, 10438 : 167 - 177
  • [5] An Algorithm for Distributed Aggregation-join Query Processing in Data Grids
    Feng, Hua
    Zhang, Zhenhuan
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2008, 8 (05): : 102 - 110
  • [6] Multi-table join algorithm for data warehouse query processing
    Jiang, X.D.
    Zhou, L.Z.
    [J]. Ruan Jian Xue Bao/Journal of Software, 2001, 12 (02): : 190 - 195
  • [7] Optimizing Communication for Multi-Join Query Processing in Cloud Data Warehouses
    Kurunji, Swathi
    Ge, Tingjian
    Fu, Xinwen
    Liu, Benyuan
    Chen, Cindy X.
    [J]. INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2013, 5 (04) : 113 - 130
  • [8] Causality Join Query Processing for Data Streams via a Spatiotemporal Sliding Window
    Kwon, Oje
    Li, Ki-Joune
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2009, 15 (12) : 2287 - 2310
  • [9] Web/XML data management and query processing
    Zhou, AY
    Zheng, SH
    Qian, WN
    [J]. WORLD WIDE WEB TECHNOLOGIES IN CHINA: RESEARCH, DEVELOPMENT, AND APPLICATIONS, 2002, : 95 - 115
  • [10] Distributed stream join query processing with semijoins
    Tran, Tri Minh
    Lee, Byung Suk
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2010, 27 (03) : 211 - 254