Data abstractions for decision tree induction

被引:4
|
作者
Kudoh, Y [1 ]
Haraguchi, M [1 ]
Okubo, Y [1 ]
机构
[1] Hokkaido Univ, Div Elect & Informat Engn, Sapporo, Hokkaido 0608628, Japan
关键词
data mining; machine learning; abstraction; classification;
D O I
10.1016/S0304-3975(02)00178-0
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
When descriptions of data values in a database are too concrete or too detailed, the computational complexity needed to discover useful knowledge from the database will be generally increased. Furthermore, discovered knowledge tends to become complicated. A notion of data abstraction seems useful to resolve this kind of problems, as we obtain a smaller and more general database after the abstraction, from which we can quickly extract more abstract knowledge that is expected to be easier to understand. In general, however, since there exist several possible abstractions, we have to carefully select one according to which the original database is generalized. An inadequate selection would make the accuracy of extracted knowledge worse. From this point of view, we propose in this paper a method of selecting an appropriate abstraction from possible ones, assuming that our task is to construct a decision tree from a relational database. Suppose that, for each attribute in a relational database, we have a class of possible abstractions for the attribute values. As an appropriate abstraction for each attribute, we prefer an abstraction such that, even after the abstraction, the distribution of target classes necessary to perform our classification task can be preserved within an acceptable error range given by user. By the selected abstractions, the original database can be transformed into a small generalized database written in abstract values. Therefore, it would be expected that, from the generalized database, we can construct a decision tree whose size is much smaller than one constructed from the original database. Furthermore, such a size reduction can be justified under some theoretical assumptions. The appropriateness of abstraction is precisely defined in terms of the standard information theory. Therefore, we call our abstraction framework Information Theoretical Abstraction. We show some experimental results obtained by a system ITA that is an implementation of our abstraction method. From those results, it is verified that our method is very effective in reducing the size of detected decision tree without making classification errors so worse. (C) 2002 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:387 / 416
页数:30
相关论文
共 50 条
  • [1] Big Data with Decision Tree Induction
    Sabah, Shabnam
    Anwar, Sara Zumerrah Binte
    Afroze, Sadia
    Azad, Md. Abulkalam
    Shatabda, Swakkhar
    Farid, Dewan Md.
    2019 13TH INTERNATIONAL CONFERENCE ON SOFTWARE, KNOWLEDGE, INFORMATION MANAGEMENT AND APPLICATIONS (SKIMA), 2019,
  • [2] Modelling oculomotor data with decision tree induction
    Viikki, K
    Isotalo, E
    Juhola, M
    Pyykkö, I
    MEDICAL INFORMATICS EUROPE '99, 1999, 68 : 660 - 663
  • [3] Evaluating training data suitability for decision tree induction
    Viikki K.
    Juhola M.
    Pyykkö I.
    Honkavaara P.
    Journal of Medical Systems, 2001, 25 (2) : 133 - 144
  • [4] Decision Tree Induction from Numeric Data Stream
    Nishimura, Satoru
    Terabe, Masahiro
    Hashimoto, Kazuo
    AI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5360 : 311 - 317
  • [5] Using decision tree induction to model oculomotor data
    Vükki, K
    Isotalo, E
    Juhola, M
    Pyykkö, I
    SCANDINAVIAN AUDIOLOGY, 2001, 30 : 103 - 105
  • [6] Using decision tree induction for discovering holes in data
    Liu, B
    Wang, K
    Mun, LF
    Qi, XZ
    PRICAI'98: TOPICS IN ARTIFICIAL INTELLIGENCE, 1998, 1531 : 182 - 193
  • [7] Cost-sensitive Decision Tree Induction on Dirty Data
    Qi Z.-X.
    Wang H.-Z.
    Zhou X.
    Li J.-Z.
    Gao H.
    Ruan Jian Xue Bao/Journal of Software, 2019, 30 (03): : 604 - 619
  • [8] Global data analysis and the fragmentation problem in decision tree induction
    Vilalta, R
    Blix, G
    Rendell, L
    MACHINE LEARNING : ECML-97, 1997, 1224 : 312 - 326
  • [9] Generalization and decision tree induction: Efficient classification in data mining
    Kamber, M
    Winstone, L
    Gong, W
    Cheng, S
    Han, JW
    SEVENTH INTERNATIONAL WORKSHOP ON RESEARCH ISSUES IN DATA ENGINEERING, PROCEEDINGS: HIGH PERFORMANCE DATABASE MANAGEMENT FOR LARGE-SCALE APPLICATIONS, 1997, : 111 - 120
  • [10] Recursive decision tree induction based on homogeneousness for data clustering
    Varghese, Bindiya M.
    Unnikrishnan, A.
    PROCEEDINGS OF THE 2008 INTERNATIONAL CONFERENCE ON CYBERWORLDS, 2008, : 754 - +