Mining combined causes in large data sets

被引:12
|
作者
Ma, Saisai [1 ]
Li, Jiuyong [1 ]
Liu, Lin [1 ]
Thuc Duy Le [1 ]
机构
[1] Univ S Australia, Sch Informat Technol & Math Sci, Mawson Lakes, SA 5095, Australia
基金
澳大利亚研究理事会;
关键词
Causal discovery; Combined causes; Local causal discovery; HITON-PC; Multi-level HITON-PC; LEARNING BAYESIAN NETWORKS; ASSOCIATION; DISCOVERY; CAUSATION; MODELS;
D O I
10.1016/j.knosys.2015.10.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, many methods have been developed for detecting causal relationships in observational data. Some of them have the potential to tackle large data sets. However, these methods fail to discover a combined cause, i.e. a multi-factor cause consisting of two or more component variables which individually are not causes. A straightforward approach to uncovering a combined cause is to include both individual and combined variables in the causal discovery using existing methods, but this scheme is computationally infeasible due to the huge number of combined variables. In this paper, we propose a novel approach to address this practical causal discovery problem, i.e. mining combined causes in large data sets. The experiments with both synthetic and real world data sets show that the proposed method can obtain high-quality causal discoveries with a high computational efficiency. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:104 / 111
页数:8
相关论文
共 50 条
  • [1] Visual data mining of large spatial data sets
    Keim, DA
    Panse, C
    Sips, M
    [J]. DATABASES IN NETWORKED INFORMATION SYSTEMS, PROCEEDINGS, 2003, 2822 : 201 - 215
  • [2] From visualisation to data mining with large data sets
    Adelmann, A
    Ryne, RD
    Shalf, JM
    Siegerist, C
    [J]. 2005 IEEE PARTICLE ACCELERATOR CONFERENCE (PAC), VOLS 1-4, 2005, : 542 - 544
  • [3] Mining for empty rectangles in large data sets
    Edmonds, J
    Gryz, J
    Liang, DM
    Miller, RJ
    [J]. DATABASE THEORY - ICDT 2001, PROCEEDINGS, 2001, 1973 : 174 - 188
  • [4] Scalability issue in mining large data sets
    Mc Manus, A
    Kechadi, MT
    [J]. DATA MINING V: DATA MINING, TEXT MINING AND THEIR BUSINESS APPLICATIONS, 2004, 10 : 189 - 197
  • [5] Mining for empty spaces in large data sets
    Edmonds, J
    Gryz, J
    Liang, DM
    Miller, RJ
    [J]. THEORETICAL COMPUTER SCIENCE, 2003, 296 (03) : 435 - 452
  • [6] Data mining of large high throughput screening data sets
    Young, SS
    Rusinko, A
    [J]. DIMENSION REDUCTION, COMPUTATIONAL COMPLEXITY AND INFORMATION, 1998, 30 : 543 - 543
  • [7] Data mining from extreme data sets: Very large and/or very skewed data sets
    Hall, LO
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 2555 - 2555
  • [8] Mining large heterogeneous data sets in drug discovery
    Wild, David J.
    [J]. EXPERT OPINION ON DRUG DISCOVERY, 2009, 4 (10) : 995 - 1004
  • [9] Visual data mining in large geospatial point sets
    Keim, DA
    Panse, C
    Sips, M
    North, SC
    [J]. IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2004, 24 (05) : 36 - 44
  • [10] Mining large data sets on grids: Issues and prospects
    Skillicorn, D
    Talia, D
    [J]. COMPUTING AND INFORMATICS, 2002, 21 (04) : 347 - 362