Improving Markov Chain Monte Carlo model search for data mining

被引:92
|
作者
Giudici, P
Castelo, R
机构
[1] Univ Pavia, Dept Econ & Quantitat Methods, I-27100 Pavia, Italy
[2] Univ Utrecht, Inst Comp & Informat Sci, NL-3508 TC Utrecht, Netherlands
关键词
Bayesian structural learning; convergence diagnostics; Dirichlet distribution; market basket analysis; Markov chain Monte Carlo;
D O I
10.1023/A:1020202028934
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The motivation of this paper is the application of MCMC model scoring procedures to data mining problems, involving a large number of competing models and other relevant model choice aspects. To achieve this aim we analyze one of the most popular Markov Chain Monte Carlo methods for structural learning in graphical models, namely, the MC3 algorithm proposed by D. Madigan and J. York (International Statistical Review, 63, 215-232, 1995). Our aim is to improve their algorithm to make it an effective and reliable tool in the field of data mining. In such context, typically highly dimensional in the number of variables, little can be known a priori and, therefore, a good model search algorithm is crucial. We present and describe in detail our implementation of the MC3 algorithm, which provides an efficient general framework for computations with both Directed Acyclic Graphical (DAG) models and Undirected Decomposable Models (UDG). We believe that the possibility of commuting easily between the two classes of models constitutes an important asset in data mining, where an a priori knowledge of causal effects is usually difficult to establish. Furthermore, in order to improve the MC3 method we propose provide several graphical monitors which can help extracting results and assessing the goodness of the Markov chain Monte Carlo approximation to the posterior distribution of interest. We apply our proposed methodology first to the well-known coronary heart disease dataset (D. Edwards & T. Havranek, Biometrika, 72:2, 339-351, 1985). We then introduce a novel data mining application which concerns market basket analysis.
引用
收藏
页码:127 / 158
页数:32
相关论文
共 50 条
  • [1] Improving Markov Chain Monte Carlo Model Search for Data Mining
    Paolo Giudici
    Robert Castelo
    Machine Learning, 2003, 50 : 127 - 158
  • [2] Improving Operational Intensity in Data Bound Markov Chain Monte Carlo
    Nemeth, Balazs
    Haber, Tom
    Ashby, Thomas J.
    Lamotte, Wim
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS 2017), 2017, 108 : 2348 - 2352
  • [3] Improving Markov Chain Monte Carlo algorithms in LISA Pathfinder Data Analysis
    Karnesis, N.
    Nofrarias, M.
    Sopuerta, C. F.
    Lobo, A.
    9TH EDOARDO AMALDI CONFERENCE ON GRAVITATIONAL WAVES (AMALDI 9) AND THE 2011 NUMERICAL RELATIVITY - DATA ANALYSIS MEETING (NRDA 2011), 2012, 363
  • [4] A Markov chain Monte Carlo algorithm for Bayesian policy search
    Aghaei, Vahid Tavakol
    Onat, Ahmet
    Yildirim, Sinan
    SYSTEMS SCIENCE & CONTROL ENGINEERING, 2018, 6 (01): : 438 - 455
  • [5] On Markov chain Monte Carlo methods for tall data
    Bardenet, Remi
    Doucet, Arnaud
    Holmes, Chris
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18 : 1 - 43
  • [6] On Markov chain Monte Carlo methods for tall data
    1600, Microtome Publishing (18):
  • [7] Markov Chain Monte Carlo
    Henry, Ronnie
    EMERGING INFECTIOUS DISEASES, 2019, 25 (12) : 2298 - 2298
  • [8] Monte Carlo Tennis: A Stochastic Markov Chain Model
    Newton, Paul K.
    Aslam, Kamran
    JOURNAL OF QUANTITATIVE ANALYSIS IN SPORTS, 2009, 5 (03)
  • [9] Markov chain Monte Carlo analysis of correlated count data
    Chib, S
    Winkelmann, R
    JOURNAL OF BUSINESS & ECONOMIC STATISTICS, 2001, 19 (04) : 428 - 435
  • [10] A MARGINALISED MARKOV CHAIN MONTE CARLO APPROACH FOR MODEL BASED ANALYSIS OF EEG DATA
    Hettiarachchi, Imali
    Mohamed, Shady
    Nahavandi, Saeid
    2012 9TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2012, : 1539 - 1542