Mining combined causes in large data sets

被引:12
|
作者
Ma, Saisai [1 ]
Li, Jiuyong [1 ]
Liu, Lin [1 ]
Thuc Duy Le [1 ]
机构
[1] Univ S Australia, Sch Informat Technol & Math Sci, Mawson Lakes, SA 5095, Australia
基金
澳大利亚研究理事会;
关键词
Causal discovery; Combined causes; Local causal discovery; HITON-PC; Multi-level HITON-PC; LEARNING BAYESIAN NETWORKS; ASSOCIATION; DISCOVERY; CAUSATION; MODELS;
D O I
10.1016/j.knosys.2015.10.018
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, many methods have been developed for detecting causal relationships in observational data. Some of them have the potential to tackle large data sets. However, these methods fail to discover a combined cause, i.e. a multi-factor cause consisting of two or more component variables which individually are not causes. A straightforward approach to uncovering a combined cause is to include both individual and combined variables in the causal discovery using existing methods, but this scheme is computationally infeasible due to the huge number of combined variables. In this paper, we propose a novel approach to address this practical causal discovery problem, i.e. mining combined causes in large data sets. The experiments with both synthetic and real world data sets show that the proposed method can obtain high-quality causal discoveries with a high computational efficiency. (C) 2015 Elsevier B.V. All rights reserved.
引用
下载
收藏
页码:104 / 111
页数:8
相关论文
共 50 条
  • [31] P-AutoClass: Scalable parallel clustering for mining large data sets
    Pizzuti, C
    Talia, D
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2003, 15 (03) : 629 - 641
  • [32] Using support vector machines for mining regression classes in large data sets
    Sun, ZH
    Gao, LX
    Sun, YX
    2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 89 - 92
  • [33] Universal trees based on large combined protein sequence data sets
    James R. Brown
    Christophe J. Douady
    Michael J. Italia
    William E. Marshall
    Michael J. Stanhope
    Nature Genetics, 2001, 28 : 281 - 285
  • [34] Enabling efficient process mining on large data sets: realizing an in-database process mining operator
    Remco Dijkman
    Juntao Gao
    Alifah Syamsiyah
    Boudewijn van Dongen
    Paul Grefen
    Arthur ter Hofstede
    Distributed and Parallel Databases, 2020, 38 : 227 - 253
  • [35] Enabling efficient process mining on large data sets: realizing an in-database process mining operator
    Dijkman, Remco
    Gao, Juntao
    Syamsiyah, Alifah
    van Dongen, Boudewijn
    Grefen, Paul
    ter Hofstede, Arthur
    DISTRIBUTED AND PARALLEL DATABASES, 2020, 38 (01) : 227 - 253
  • [36] Mining frequent itemsets in large data warehouses: A novel approach proposed for sparse data sets
    Fakhrahmad, S. M.
    Jahromi, M. Zolghadri
    Sadreddini, M. H.
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2007, 2007, 4881 : 517 - +
  • [37] Visual data mining of astronomic data with virtual reality spaces:: Understanding the underlying structure of large data sets
    Valdés, JJ
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XIV, PROCEEDINGS, 2005, 347 : 51 - 60
  • [38] Global and componentwise extrapolation for accelerating data mining from large incomplete data sets with the EM algorithm
    Hsu, Chun-Nan
    Huang, Han-Shen
    Yang, Bo-Hou
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 265 - +
  • [39] Mining HTS data sets.
    Engels, M
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2001, 222 : U408 - U408
  • [40] Rough sets as a framework for data mining
    Butalia, A. H.
    Dhore, M. L.
    IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 728 - +