A Framework for Learning Comprehensible Theories in XML Document Classification

被引:6
|
作者
Wu, Jemma [1 ]
机构
[1] Macquarie Univ, Dept Environm & Geog, Fac Sci, N Ryde, NSW 2109, Australia
关键词
XML document; machine learning; knowledge representation; semi-supervised learning;
D O I
10.1109/TKDE.2011.158
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML has become the universal data format for a wide variety of information systems. The large number of XML documents existing on the web and in other information storage systems makes classification an important task. As a typical type of semistructured data, XML documents have both structures and contents. Traditional text learning techniques are not very suitable for XML document classification as structures are not considered. This paper presents a novel complete framework for XML document classification. We first present a knowledge representation method for XML documents which is based on a typed higher order logic formalism. With this representation method, an XML document is represented as a higher order logic term where both its contents and structures are captured. We then present a decision-tree learning algorithm driven by precision/recall breakeven point (PRDT) for the XML classification problem which can produce comprehensible theories. Finally, a semi-supervised learning algorithm is given which is based on the PRDT algorithm and the cotraining framework. Experimental results demonstrate that our framework is able to achieve good performance in both supervised and semi-supervised learning with the bonus of producing comprehensible learning theories.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [21] Sequential pattern mining for structure-based XML document classification
    Garboni, Calin
    Masseglia, Florent
    Trousse, Brigitte
    [J]. ADVANCES IN XML INFORMATION RETRIEVAL AND EVALUATION, 2006, 3977 : 458 - 468
  • [22] Incremental learning for text document classification
    Chen, ZhiHang
    Huang, Liping
    Murphey, Yi L.
    [J]. 2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 2591 - 2596
  • [23] Learning Document Structure for Retrieval and Classification
    Kumar, Jayant
    Ye, Peng
    Doermann, David
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 1558 - 1561
  • [24] Arabic Document Classification by Deep Learning
    Alghamdi, Taghreed
    Snoussi, Samia
    Hsairi, Lobna
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (10) : 314 - 321
  • [25] Deep Learning for Technical Document Classification
    Jiang, Shuo
    Hu, Jie
    Magee, Christopher L.
    Luo, Jianxi
    [J]. IEEE TRANSACTIONS ON ENGINEERING MANAGEMENT, 2024, 71 : 1163 - 1179
  • [26] Evolving Accurate and Comprehensible Classification Rules
    Sonstrod, Cecilia
    Johansson, Ulf
    Konig, Rikard
    [J]. 2011 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2011, : 1436 - 1443
  • [27] What makes classification trees comprehensible?
    Piltaver, Rok
    Lustrek, Mitja
    Gams, Matjaz
    Martincic-Ipsic, Sanda
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2016, 62 : 333 - 346
  • [28] XML document versioning
    Chien, SY
    Tsotras, VJ
    Zaniolo, C
    [J]. SIGMOD RECORD, 2001, 30 (03) : 46 - 53
  • [29] XML CLUSTERING FRAMEWORK BASED ON DOCUMENT CONTENT AND STRUCTURE IN A HETEROGENEOUS DIGITAL LIBRARY
    Samadi, Nafisse
    Ravana, Sri Devi
    [J]. MALAYSIAN JOURNAL OF COMPUTER SCIENCE, 2023, 36 (02) : 124 - 147
  • [30] A Flexible Framework for Malicious Open XML Document Detection based on APT Attacks
    Sun, Hung-Min
    Shen, Chi-En
    Weng, Chi-Yao
    [J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (IEEE INFOCOM 2019 WKSHPS), 2019, : 1005 - 1006