EDM: A general framework for data mining based on evidence theory

被引:51
|
作者
Anand, SS
Bell, DA
Hughes, JG
机构
关键词
data mining; knowledge discovery in databases; uncertainty handling; evidence theory; parallel discovery;
D O I
10.1016/0169-023X(95)00038-T
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data Mining or Knowledge Discovery in Databases [1,15,23] is currently one of the most exciting and challenging areas where database techniques are coupled with techniques from Artificial Intelligence and mathematical sub-disciplines to great potential advantage. It has been defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patterns which are hidden below the surface in databases. However, most of the work bring done in this field has been problem-specific and no general framework has yet been proposed for Data Mining. In this paper we seek to remedy this by proposing, EDM - Evidence-based Data Mining - a general framework for Data Mining based on Evidence Theory. Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discovered by another discovery process to be incorporated into the discovery process. A common knowledge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most discovery processes, e.g. incorporating domain knowledge and dealing with missing values. The framework presented in this paper has the following additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets - a necessity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex. Also, the parallelism within the framework allows its use in parallel, distributed and heterogeneous databases. The framework is easily updated and new discovery methods can be readily incorporated within the framework, making it 'general' in the functional sense in addition to the representational sense considered above. The framework provides an intuitive way of dealing with missing data during the discovery process using the concept of Ignorance borrowed from Evidence Theory. The framework consists of a method for representing data and knowledge, and methods for data manipulation or knowledge discovery(1). We suggest an extension of the conventional definition of mass functions in Evidence Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each operation is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by them and discuss aspects of the induction, domain and combination operator classes. The application of EDM to two separate Data Mining tasks is also addressed, highlighting the advantages of using a general framework for Data Mining in general and, in particular, using one that is based on Evidence Theory.
引用
收藏
页码:189 / 223
页数:35
相关论文
共 50 条
  • [21] An ACS-based framework for fuzzy data mining
    Hong, Tzung-Pei
    Tung, Ya-Fang
    Wang, Shyue-Liang
    Wu, Min-Thai
    Wu, Yu-Lung
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (09) : 11844 - 11852
  • [22] An efficient framework for intrusion detection based on data mining
    Li, Weidong
    Zhang, Kejun
    Li, Boqun
    Yang, Bingru
    2005 ICSC CONGRESS ON COMPUTATIONAL INTELLIGENCE METHODS AND APPLICATIONS (CIMA 2005), 2005, : 55 - 58
  • [23] Evaluating the Reliability Coefficient of a Sensor Based on the Training Data Within the Framework of Evidence Theory
    Zhu, Jingwei
    Wang, Xiaodan
    Song, Yafei
    IEEE ACCESS, 2018, 6 : 30592 - 30601
  • [24] A Classification Based Framework for Privacy Preserving Data Mining
    Tripathy, Animesh
    Dansana, Jayanti
    Mishra, Ranjita
    PROCEEDINGS OF THE 2012 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI'12), 2012, : 660 - 666
  • [25] Evidence-based framework for a theory of inheritance
    1600, Morgan Kaufmann Publ Inc, San Mateo, CA, USA (02):
  • [26] Evidence-based framework for a theory of inheritance
    1600, Morgan Kaufmann Publ Inc, San Mateo, CA, USA (02):
  • [27] General Data Mining Model System Based on Sample Data Division
    Chen, Yan
    Yang, Ming
    Zhang, Lin
    2009 SECOND INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING: KAM 2009, VOL 2, 2009, : 182 - 185
  • [28] Detecting and investigating crime by means of data mining: a general crime matching framework
    Keyvanpour, MohammadReza
    Javideh, Mostafa
    Ebrahimi, Mohammad Reza
    WORLD CONFERENCE ON INFORMATION TECHNOLOGY (WCIT-2010), 2011, 3
  • [29] A general framework for mining concept-drifting data streams with evolvable features
    Peng, Jiaqi
    Guo, Jinxia
    Yang, Qinli
    Lu, Jianyun
    Shao, Junmming
    2021 21ST IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2021), 2021, : 1276 - 1281
  • [30] A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions
    Gao, Jing
    Fan, Wei
    Han, Jiawei
    Yu, Philip S.
    PROCEEDINGS OF THE SEVENTH SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 3 - +