Probabilistic Database Summarization for Interactive Data Exploration

被引:10
|
作者
Orr, Laurel [1 ]
Balazinska, Magdalena [1 ]
Suciu, Dan [1 ]
机构
[1] Univ Washington, Seattle, WA 98195 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2017年 / 10卷 / 10期
基金
美国国家科学基金会;
关键词
D O I
10.14778/3115404.3115419
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a probabilistic approach to generate a small, query-able summary of a dataset for interactive data exploration. Departing from traditional summarization techniques, we use the Principle of Maximum Entropy to generate a probabilistic representation of the data that can be used to give approximate query answers. We develop the theoretical framework and formulation of our probabilistic representation and show how to use it to answer queries. We then present solving techniques and give three critical optimizations to improve preprocessing time and query accuracy. Lastly, we experimentally evaluate our work using a 5 GB dataset of flights within the United States and a 210 GB dataset from an astronomy particle simulation. While our current work only supports linear queries, we show that our technique can successfully answer queries faster than sampling while introducing, on average, no more error than sampling and can better distinguish between rare and nonexistent values.
引用
收藏
页码:1154 / 1165
页数:12
相关论文
共 50 条
  • [1] Interactive Summarization and Exploration of Top Aggregate Query Answers
    Wen, Yuhao
    Zhu, Xiaodan
    Roy, Sudeepa
    Yang, Jun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (13): : 2196 - 2208
  • [2] Interactive Visual Summarization of Multidimensional Data
    Kocherlakota, Sarat M.
    Healey, Christopher G.
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 362 - +
  • [3] Query Recommendations for Interactive Database Exploration
    Chatzopoulou, Gloria
    Eirinaki, Magdalini
    Polyzotis, Neoklis
    SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT, PROCEEDINGS, 2009, 5566 : 3 - +
  • [4] Interactive data exploration
    Arthur, GC
    THIRD REGIONAL APCOM: COMPUTER APPLICATIONS IN THE MINERALS INDUSTRIES INTERNATIONAL SYMPOSIUM, 1998, 98 (05): : 45 - 48
  • [5] InSide: interactive sketching for image database exploration
    Zhang, Hongxin
    Liu, Dongyu
    Wang, Changhan
    2013 INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN AND COMPUTER GRAPHICS (CAD/GRAPHICS), 2013, : 423 - 424
  • [6] INDIANA: An interactive system for assisting database exploration
    Giuzio, Antonio
    Mecca, Giansalvatore
    Quintarelli, Elisa
    Roveri, Manuel
    Santoro, Donatello
    Tanca, Letizia
    INFORMATION SYSTEMS, 2019, 83 : 40 - 56
  • [7] Interactive database for decay data
    Be, M.M.
    Duchemin, B.
    Lame, J.
    Nuclear Instruments & Methods in Physics Research, Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, 1996, 369 (2-3):
  • [8] Discovering Communities and Anomalies in Attributed Graphs: Interactive Visual Exploration and Summarization
    Perozzi, Bryan
    Akoglu, Leman
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (02)
  • [9] An interactive database for decay data
    Be, MM
    Duchemin, B
    Lame, J
    NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 1996, 369 (2-3): : 523 - 526
  • [10] Exploration of Interactive Teaching Mode in "Database Theory" Course
    Li, Ping
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON MANAGEMENT, EDUCATION, INFORMATION AND CONTROL (MEICI 2017), 2017, 156 : 5 - 8