Optimizing entity join queries when data transmission cost dominates

被引:2
|
作者
Tsai, PSM
Chen, ALP
机构
[1] NATL TSING HUA UNIV,DEPT COMP SCI,HSINCHU 300,TAIWAN
[2] MING HSIN INST TECHNOL & COMERCE,DEPT INFORMAT MANAGEMENT,HSINCHU 304,TAIWAN
关键词
entity join; extended semijoin; inconsistent data; local processing; multidatabase; query optimization; query transformation; selectivity;
D O I
10.1016/S0169-023X(96)00052-3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Heterogeneities exist in a multidatabase environment. For example, a real world entity may be differently represented in relations of different databases. In particular, keys of these relations may be incompatible. In this paper, we consider processing entity join queries when data transmission cost dominates. An entity join operation 'integrates' tuples representing the same entities from different relations in which inconsistent data may exist. A natural way to process the entity join is to transmit both relations to a site, resolve the possible conflicts between corresponding attributes and process the join, which is very costly. In this paper, an approach is proposed to correctly transform a global query into local subqueries to preprocess entity join queries in multiple sies with an attempt to lower the cost of data transmission. Besides, an extension of the traditional semijoin, named extended semijoin, is proposed to further reduce the cost of data transmission for entity join query processing.
引用
收藏
页码:283 / 308
页数:26
相关论文
共 50 条
  • [1] DECOMPOSITION IN OPTIMIZING DISTRIBUTED JOIN QUERIES
    BODORIK, P
    RIORDON, JS
    [J]. COMPUTING AND INFORMATION, 1989, : 281 - 289
  • [2] OPTIMIZING JOIN QUERIES IN DISTRIBUTED DATABASES
    PRAMANIK, S
    VINEYARD, D
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1988, 14 (09) : 1319 - 1326
  • [3] OPTIMIZING JOIN QUERIES IN DISTRIBUTED DATABASES
    PRAMANIK, S
    VINEYARD, D
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1987, 287 : 282 - 304
  • [4] Optimizing star join queries for data warehousing in Microsoft SQL Server
    Galindo-Legaria, Cesar A.
    Grabs, Torsten
    Gukal, Sreenivas
    Herbert, Steve
    Surna, Aleksandras
    Wang, Shirley
    Yu, Wei
    Zabback, Peter
    Zhang, Shin
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1190 - 1199
  • [5] Cost-based solution for optimizing multi-join queries over distributed streaming sensor data
    Gomes, Joseph
    Choi, Hyeong-Ah
    [J]. 2006 INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING, 2006, : 282 - +
  • [6] Cost-effective crowdsourced join queries for entity resolution without prior knowledge
    Yin, Bo
    Zeng, Weilong
    Wei, Xuetao
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 127 : 240 - 251
  • [7] Optimizing UNION ALL Join Queries in Teradata
    Al-Kateb, Mohammed
    Sinclair, Paul
    Crolotte, Alain
    Ma, Lu
    Au, Grace
    Nair, Sanjay
    [J]. 2017 IEEE 33RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2017), 2017, : 1209 - 1212
  • [8] Optimizing Integrity Checks for Join Queries in the Cloud
    di Vimercati, Sabrina De Capitani
    Foresti, Sara
    Jajodia, Sushil
    Paraboschi, Stefano
    Samarati, Pierangela
    [J]. DATA AND APPLICATIONS SECURITY AND PRIVACY XXVIII, 2014, 8566 : 33 - 48
  • [9] Optimizing large join queries in mediation systems
    Yerneni, R
    Li, C
    Ullman, J
    Garcia-Molina, H
    [J]. DATABASE THEORY - ICDT'99, 1999, 1540 : 348 - 364
  • [10] Optimizing distributed join queries: A genetic algorithm approach
    Sangkyu Rho
    Salvatore T. March
    [J]. Annals of Operations Research, 1997, 71 : 199 - 228