Selectivity Estimation of Inequality Joins in Databases

被引:1
|
作者
Repas, Diogo [1 ]
Luo, Zhicheng [1 ]
Schoemans, Maxime [1 ]
Sakr, Mahmoud [1 ,2 ]
机构
[1] Univ libre Bruxelles ULB, Data Sci Lab, B-1050 Brussels, Belgium
[2] Ain Shams Univ, Fac Comp & Informat Sci, Cairo 11566, Egypt
关键词
SQL; query optimization; optimizer statistics;
D O I
10.3390/math11061383
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Selectivity estimation refers to the ability of the SQL query optimizer to estimate the size of the results of a predicate in the query. It is the main calculation based on which the optimizer can select the least expensive plan to execute. While the problem has been known since the mid-1970s, we were surprised that there are no solutions in the literature for the selectivity estimation of inequality joins. By testing four common database systems: Oracle, SQL-Server, PostgreSQL, and MySQL, we found that the open-source systems PostgreSQL and MySQL lack this estimation. Oracle and SQL-Server make fairly accurate estimations, yet their algorithms are secret. This paper, thus, proposes an algorithm for inequality join selectivity estimation. The proposed algorithm was implemented in PostgreSQL and sent as a patch to be included in the next releases. We compared this implementation with the above DBMS for three different data distributions (uniform, normal, and Zipfian) and showed that our algorithm provides extremely accurate estimations (below 0.1% average error), outperforming the other systems by an order of magnitude.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Selectivity estimation for spatial joins
    An, N
    Yang, ZY
    Sivasubramaniam, A
    17TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, PROCEEDINGS, 2001, : 368 - 375
  • [2] Efficient Selectivity Estimation for Relation-Tree Joins in Multi-Model Databases
    Qi, Linli
    Jin, Peiquan
    Wan, Shouhong
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 5998 - 6002
  • [3] Efficient selectivity estimation for distance joins
    Xiong, Wei
    Zhang, Ju
    Jing, Ning
    Chen, Hong-Sheng
    Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2004, 26 (06): : 82 - 85
  • [4] Selectivity estimation in spatial databases
    Acharya, S
    Poosala, V
    Ramaswamy, S
    SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999: SIGMOD99: PROCEEDINGS OF THE 1999 ACM SIGMOD - INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 1999, : 13 - 24
  • [5] Selectivity Estimation for Relation-Tree Joins
    Zhang, Chao
    Lu, Jiaheng
    PROCEEDINGS OF THE 32TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, SSDBM 2020, 2020,
  • [6] Selectivity estimation for joins using systematic sampling
    Harangsri, B
    Shepherd, J
    Ngu, A
    EIGHTH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 1997, : 384 - 389
  • [7] Selectivity estimation for spatial joins with geometric selections
    Sun, C
    Agrawal, D
    El Abbadi, A
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2002, 2002, 2287 : 609 - 626
  • [8] Selectivity and cost estimation for joins based on random sampling
    Haas, PJ
    Naughton, JF
    Seshadri, S
    Swami, AN
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1996, 52 (03) : 550 - 569
  • [9] Selectivity estimation for optimizing similarity query in multimedia databases
    Lee, JH
    Chun, SJ
    Park, S
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, 2003, 2690 : 638 - 644
  • [10] Fast and scalable inequality joins
    Zuhair Khayyat
    William Lucia
    Meghna Singh
    Mourad Ouzzani
    Paolo Papotti
    Jorge-Arnulfo Quiané-Ruiz
    Nan Tang
    Panos Kalnis
    The VLDB Journal, 2017, 26 : 125 - 150