EDM: A general framework for data mining based on evidence theory

被引:51
|
作者
Anand, SS
Bell, DA
Hughes, JG
机构
关键词
data mining; knowledge discovery in databases; uncertainty handling; evidence theory; parallel discovery;
D O I
10.1016/0169-023X(95)00038-T
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data Mining or Knowledge Discovery in Databases [1,15,23] is currently one of the most exciting and challenging areas where database techniques are coupled with techniques from Artificial Intelligence and mathematical sub-disciplines to great potential advantage. It has been defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patterns which are hidden below the surface in databases. However, most of the work bring done in this field has been problem-specific and no general framework has yet been proposed for Data Mining. In this paper we seek to remedy this by proposing, EDM - Evidence-based Data Mining - a general framework for Data Mining based on Evidence Theory. Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discovered by another discovery process to be incorporated into the discovery process. A common knowledge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most discovery processes, e.g. incorporating domain knowledge and dealing with missing values. The framework presented in this paper has the following additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets - a necessity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex. Also, the parallelism within the framework allows its use in parallel, distributed and heterogeneous databases. The framework is easily updated and new discovery methods can be readily incorporated within the framework, making it 'general' in the functional sense in addition to the representational sense considered above. The framework provides an intuitive way of dealing with missing data during the discovery process using the concept of Ignorance borrowed from Evidence Theory. The framework consists of a method for representing data and knowledge, and methods for data manipulation or knowledge discovery(1). We suggest an extension of the conventional definition of mass functions in Evidence Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each operation is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by them and discuss aspects of the induction, domain and combination operator classes. The application of EDM to two separate Data Mining tasks is also addressed, highlighting the advantages of using a general framework for Data Mining in general and, in particular, using one that is based on Evidence Theory.
引用
收藏
页码:189 / 223
页数:35
相关论文
共 50 条
  • [31] A framework for a general model-based ATR theory
    Weiss, I
    ALGORITHMS FOR SYNTHETIC APERTURE RADAR IMAGERY XI, 2004, 5427 : 449 - 458
  • [32] Framework for mining community consultation based on discrete choice theory
    Que, S. (sq3g3@mst.edu), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (05):
  • [33] The research of image mining framework based on Hilbert Space theory
    You, FC
    Yang, GW
    PROCEEDINGS OF THE 2005 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS AND BRAIN, VOLS 1-3, 2005, : 733 - 737
  • [34] A general framework for time series data mining based on event analysis: Application to the medical domains of electroencephalography and stabilometry
    Lara, Juan A.
    Lizcano, David
    Perez, Aurora
    Valente, Juan P.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 51 : 219 - 241
  • [35] Case-based reasoning framework based on data mining technique
    Ni, ZW
    Yang, SL
    Yang, Y
    Li, FG
    PROCEEDINGS OF THE 2004 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2004, : 2511 - 2514
  • [36] Mining gene expression data based on template theory
    Yang, ZR
    BIOINFORMATICS, 2004, 20 (16) : 2759 - 2766
  • [37] The Research on Algorithms for Data Mining Based on Fuzzy Theory
    Wang, Aimin
    Cui, Hongbin
    Yang, Zhimin
    PROCEEDINGS OF 2008 INTERNATIONAL PRE-OLYMPIC CONGRESS ON COMPUTER SCIENCE, VOL I: COMPUTER SCIENCE AND ENGINEERING, 2008, : 283 - 288
  • [38] A data mining algorithm based on rough set theory
    Zhou, CL
    Li, ZG
    Meng, YJ
    Meng, QL
    ICIA 2004: Proceedings of 2004 International Conference on Information Acquisition, 2004, : 413 - 416
  • [39] The research of algorithm for data mining based on fuzzy theory
    Wang, Aimin
    Li, Jie
    Journal of Digital Information Management, 2013, 11 (05): : 327 - 334
  • [40] Research on Data Mining Algorithms Based on Fuzzy Theory
    Wang, Ai-min
    Yang, Yu-xing
    Yang, Zhi-min
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 11660 - +