Query Optimization for Dynamic Imputation

被引:21
|
作者
Cambronero, Jose [1 ]
Feser, John K. [1 ]
Smith, Micah J. [2 ]
Madden, Samuel [1 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] MIT, LIDS, Cambridge, MA 02139 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2017年 / 10卷 / 11期
关键词
D O I
10.14778/3137628.3137641
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Missing values are common in data analysis and present a usability challenge. Users are forced to pick between removing tuples withmissing values or creating a cleaned version of their data by applying a relatively expensive imputation strategy. Our system, ImputeDB, incorporates imputation into a cost-based query optimizer, performing necessary imputations on-the- fly for each query. This allows users to immediately explore their data, while the system picks the optimal placement of imputation operations. We evaluate this approach on three real-world survey-based datasets. Our experiments show that our query plans execute between 10 and 140 times faster than first imputing the base tables. Furthermore, we show that the query results from on-the-fly imputation differ from the traditional base-table imputation approach by 0-8%. Finally, we show that while dropping tuples with missing values that fail query constraints discards 6-78% of the data, on-the-fly imputation loses only 0-21%.
引用
收藏
页码:1310 / 1321
页数:12
相关论文
共 50 条
  • [1] Dynamic query optimization approach for semantic database grid
    Zheng, Xiao-Qin
    Chen, Hua-Jun
    Wu, Zhao-Hui
    Mao, Yu-Xin
    [J]. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2006, 21 (04) : 597 - 608
  • [2] A dynamic virtual fragmentation method for query recovery optimization
    Vázquez, JA
    [J]. XX INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY - PROCEEDINGS, 2000, : 50 - 57
  • [3] Dynamic Query Optimization Approach for Semantic Database Grid
    Xiao-Qing Zheng
    Hua-Jun Chen
    Zhao-Hui Wu
    Yu-Xin Mao
    [J]. Journal of Computer Science and Technology, 2006, 21 : 597 - 608
  • [4] ANALYSIS OF A DYNAMIC QUERY OPTIMIZATION TECHNIQUE FOR MULTIJOIN QUERIES
    VANDENBERG, CA
    KERSTEN, ML
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 1994, 27 (03) : 233 - 241
  • [5] Dynamic programming solution for multiple query optimization problem
    Toroslu, IH
    Cosar, A
    [J]. INFORMATION PROCESSING LETTERS, 2004, 92 (03) : 149 - 155
  • [6] Semantic Stream Query Optimization Exploiting Dynamic Metadata
    Ding, Luping
    Works, Karen
    Rundensteiner, Elke A.
    [J]. IEEE 27TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2011), 2011, : 111 - 122
  • [7] Dynamic Query Optimization under Access Limitations and Dependencies
    Cali, Andrea
    Calvanese, Diego
    Martinenghi, Davide
    [J]. JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2009, 15 (01) : 33 - 62
  • [8] ZIP: Lazy Imputation during Query Processing
    Lin, Yiming
    Mehrotra, Sharad
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (01): : 28 - 40
  • [9] Query-Based Learning for Dynamic Particle Swarm Optimization
    Chang, Ray-I
    Hsu, Hung-Min
    Lin, Shu-Yu
    Chang, Chu-Chun
    Ho, Jan-Ming
    [J]. IEEE ACCESS, 2017, 5 : 7648 - 7658
  • [10] A Distributed DBMS Based Dynamic Programming Method for Query Optimization
    孙纪舟
    李阳
    蒋志勇
    顾云苏
    何清法
    [J]. Journal of Donghua University(English Edition), 2012, 29 (01) : 55 - 58