The Monte Carlo Database System: Stochastic Analysis Close to the Data

被引:22
|
作者
Jampani, Ravi [2 ]
Xu, Fei [3 ]
Wu, Mingxi [4 ]
Perez, Luis [1 ]
Jermaine, Chris [1 ]
Haas, Peter J. [5 ]
机构
[1] Rice Univ, Dept Comp Sci, Houston, TX 77005 USA
[2] Univ Florida, Gainesville, FL 32611 USA
[3] Microsoft Corp, Redmond, WA 98052 USA
[4] Oracle Corp, Redwood Shores, CA 94065 USA
[5] IBM Almaden Res Ctr, Armonk, NY 10504 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2011年 / 36卷 / 03期
基金
美国国家科学基金会;
关键词
Algorithms; Performance; MCDB; relational database systems; uncertainty; UNCERTAIN; INTEGRATION;
D O I
10.1145/2000824.2000828
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The application of stochastic models and analysis techniques to large datasets is now commonplace. Unfortunately, in practice this usually means extracting data from a database system into an external tool (such as SAS, R, Arena, or Matlab), and then running the analysis there. This extract-and-model paradigm is typically error-prone, slow, does not support fine-grained modeling, and discourages what-if and sensitivity analyses. In this article we describe MCDB, a database system that permits a wide spectrum of stochastic models to be used in conjunction with the data stored in a large database, without ever extracting the data. MCDB facilitates in-database execution of tasks such as risk assessment, prediction, and imputation of missing data, as well as management of errors due to data integration, information extraction, and privacy-preserving data anonymization. MCDB allows a user to define "random" relations whose contents are determined by stochastic models. The models can then be queried using standard SQL. Monte Carlo techniques are used to analyze the probability distribution of the result of an SQL query over random relations. Novel "tuple-bundle" processing techniques can effectively control the Monte Carlo overhead, as shown in our experiments.
引用
收藏
页数:41
相关论文
共 50 条
  • [1] MONTE CARLO SIMULATION OF MULTIFOCAL STOCHASTIC SCANNING SYSTEM
    Liu, Lixin
    Qian, Jia
    Li, Yahui
    Peng, Xiao
    Yin, Jun
    [J]. JOURNAL OF INNOVATIVE OPTICAL HEALTH SCIENCES, 2014, 7 (01)
  • [2] On the use of stochastic approximation Monte Carlo for Monte Carlo integration
    Liang, Faming
    [J]. STATISTICS & PROBABILITY LETTERS, 2009, 79 (05) : 581 - 587
  • [3] Stochastic Analysis of Structures in Fire by Monte Carlo Simulation
    Shi, Kaihang
    Guo, Qianru
    Jeffers, Ann E.
    [J]. JOURNAL OF STRUCTURAL FIRE ENGINEERING, 2013, 4 (01) : 37 - 46
  • [4] A Monte Carlo Analysis for Stochastic Distance Function Frontier
    Zhang, Tao
    [J]. INZINERINE EKONOMIKA-ENGINEERING ECONOMICS, 2012, 23 (03): : 250 - 255
  • [5] Stochastic method for analytic continuation of quantum Monte Carlo data
    Sandvik, AW
    [J]. PHYSICAL REVIEW B, 1998, 57 (17): : 10287 - 10290
  • [6] Progress on stochastic analytic continuation of quantum Monte Carlo data
    Shao, Hui
    Sandvik, Anders W.
    [J]. PHYSICS REPORTS-REVIEW SECTION OF PHYSICS LETTERS, 2023, 1003 : 1 - 88
  • [7] MONTE CARLO ANALYSIS OF SYSTEM OUTAGE
    MALEC, HA
    [J]. MICROELECTRONICS AND RELIABILITY, 1971, 10 (05): : 339 - &
  • [8] Method of Monte Carlo grid for data analysis
    Filipowicz, M
    Bystritsky, VM
    Knowles, PE
    Mulhauser, F
    Wozniak, J
    [J]. NUCLEAR INSTRUMENTS & METHODS IN PHYSICS RESEARCH SECTION A-ACCELERATORS SPECTROMETERS DETECTORS AND ASSOCIATED EQUIPMENT, 2005, 547 (2-3): : 652 - 662
  • [9] Data analysis for quantum Monte Carlo simulations
    Neuber, DR
    Fischer, RO
    von der Linden, W
    [J]. BAYESIAN INFERENCE AND MAXIMUM ENTROPY METHODS IN SCIENCE AND ENGINEERING, 2004, 735 : 245 - 251
  • [10] A Monte Carlo Method to Data Stream Analysis
    Kerdprasop, Kittisak
    Kerdprasop, Nittaya
    Sattayatham, Pairote
    [J]. PROCEEDINGS OF WORLD ACADEMY OF SCIENCE, ENGINEERING AND TECHNOLOGY, VOL 14, 2006, 14 : 240 - +