EDM: A general framework for data mining based on evidence theory

被引:51
|
作者
Anand, SS
Bell, DA
Hughes, JG
机构
关键词
data mining; knowledge discovery in databases; uncertainty handling; evidence theory; parallel discovery;
D O I
10.1016/0169-023X(95)00038-T
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data Mining or Knowledge Discovery in Databases [1,15,23] is currently one of the most exciting and challenging areas where database techniques are coupled with techniques from Artificial Intelligence and mathematical sub-disciplines to great potential advantage. It has been defined as the non-trivial extraction of implicit, previously unknown and potentially useful information from data. A lot of research effort is being directed towards building tools for discovering interesting patterns which are hidden below the surface in databases. However, most of the work bring done in this field has been problem-specific and no general framework has yet been proposed for Data Mining. In this paper we seek to remedy this by proposing, EDM - Evidence-based Data Mining - a general framework for Data Mining based on Evidence Theory. Having a general framework for Data Mining offers a number of advantages. It provides a common method for representing knowledge which allows prior knowledge from the user or knowledge discovered by another discovery process to be incorporated into the discovery process. A common knowledge representation also supports the discovery of meta-knowledge from knowledge discovered by different Data Mining techniques. Furthermore, a general framework can provide facilities that are common to most discovery processes, e.g. incorporating domain knowledge and dealing with missing values. The framework presented in this paper has the following additional advantages. The framework is inherently parallel. Thus, algorithms developed within this framework will also be parallel and will therefore be expected to be efficient for large data sets - a necessity as most commercial data sets, relational or otherwise, are very large. This is compounded by the fact that the algorithms are complex. Also, the parallelism within the framework allows its use in parallel, distributed and heterogeneous databases. The framework is easily updated and new discovery methods can be readily incorporated within the framework, making it 'general' in the functional sense in addition to the representational sense considered above. The framework provides an intuitive way of dealing with missing data during the discovery process using the concept of Ignorance borrowed from Evidence Theory. The framework consists of a method for representing data and knowledge, and methods for data manipulation or knowledge discovery(1). We suggest an extension of the conventional definition of mass functions in Evidence Theory for use in Data Mining, as a means to represent evidence of the existence of rules in the database. The discovery process within EDM consists of a series of operations on the mass functions. Each operation is carried out by an EDM operator. We provide a classification for the EDM operators based on the discovery functions performed by them and discuss aspects of the induction, domain and combination operator classes. The application of EDM to two separate Data Mining tasks is also addressed, highlighting the advantages of using a general framework for Data Mining in general and, in particular, using one that is based on Evidence Theory.
引用
收藏
页码:189 / 223
页数:35
相关论文
共 50 条
  • [1] Towards a general framework for data mining
    Dzeroski, Saso
    KNOWLEDGE DISCOVERY IN INDUCTIVE DATABASES, 2007, 4747 : 259 - 300
  • [2] A general framework on temporal data mining
    Pan, Ding
    Pan, Yan
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1019 - +
  • [3] A framework for dynamic evidence based medicine using data mining
    Masuda, C
    Sakamoto, N
    Yamamoto, R
    PROCEEDINGS OF THE 15TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, 2002, : 117 - 122
  • [4] Extensive Conflict Analysis of Data Mining Based on Evidence Theory
    Zhu, Jiajun
    ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT III, 2011, 216 : 580 - 586
  • [5] A general framework for mining massive data streams
    Domingos, P
    Hulten, G
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2003, 12 (04) : 945 - 949
  • [6] A framework for data mining on combinatorial game theory
    Hooks, David
    Ding, Qin
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2009, 9 (01) : S91 - S98
  • [7] Tracking aided target identification based on evidence theory data mining
    Wang Hongfeng
    Shan Ganlin
    Lu, Gao
    ICEMI 2007: PROCEEDINGS OF 2007 8TH INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, VOL III, 2007, : 943 - +
  • [8] Crime data mining: A general framework and some examples
    Chen, HC
    Chung, WY
    Xu, JJ
    Wang, G
    Qin, Y
    Chau, M
    COMPUTER, 2004, 37 (04) : 50 - +
  • [9] Educational data mining for decision-making: a framework based on student development theory
    Lei, Xiao-Feng
    Yang, Ming
    Cai, Yi
    PROCEEDINGS OF THE 2ND ANNUAL INTERNATIONAL CONFERENCE ON ELECTRONICS, ELECTRICAL ENGINEERING AND INFORMATION SCIENCE (EEEIS 2016), 2016, 117 : 628 - 641
  • [10] The data mining based on AFS theory
    Liu, XD
    Wang, FW
    Wei, H
    PROCEEDINGS OF 2002 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING, VOLS I AND II, 2002, : 119 - 123