A Framework for Learning Comprehensible Theories in XML Document Classification

被引:6
|
作者
Wu, Jemma [1 ]
机构
[1] Macquarie Univ, Dept Environm & Geog, Fac Sci, N Ryde, NSW 2109, Australia
关键词
XML document; machine learning; knowledge representation; semi-supervised learning;
D O I
10.1109/TKDE.2011.158
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML has become the universal data format for a wide variety of information systems. The large number of XML documents existing on the web and in other information storage systems makes classification an important task. As a typical type of semistructured data, XML documents have both structures and contents. Traditional text learning techniques are not very suitable for XML document classification as structures are not considered. This paper presents a novel complete framework for XML document classification. We first present a knowledge representation method for XML documents which is based on a typed higher order logic formalism. With this representation method, an XML document is represented as a higher order logic term where both its contents and structures are captured. We then present a decision-tree learning algorithm driven by precision/recall breakeven point (PRDT) for the XML classification problem which can produce comprehensible theories. Finally, a semi-supervised learning algorithm is given which is based on the PRDT algorithm and the cotraining framework. Experimental results demonstrate that our framework is able to achieve good performance in both supervised and semi-supervised learning with the bonus of producing comprehensible learning theories.
引用
收藏
页码:1 / 14
页数:14
相关论文
共 50 条
  • [1] Learning comprehensible theories from structured data
    Lloyd, JW
    [J]. ADVANCED LECTURES ON MACHINE LEARNING, 2002, 2600 : 203 - 225
  • [2] On classification of XML document transformations
    Dvorakova, Jana
    [J]. DATESO 2005 - DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2005, : 69 - 83
  • [3] XML document indexes: A classification
    Catania, B
    Maddalena, A
    Vakali, A
    [J]. IEEE INTERNET COMPUTING, 2005, 9 (05) : 64 - 71
  • [4] XML document classification based on ELM
    Zhao, Xiang-guo
    Wang, Guoren
    Bi, Xin
    Gong, Peizhen
    Zhao, Yuhai
    [J]. NEUROCOMPUTING, 2011, 74 (16) : 2444 - 2451
  • [5] Access Control Framework for XML Document Collections
    Sladic, Goran
    Milosavljevic, Branko
    Konjovic, Zora
    Vidakovic, Milan
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2011, 8 (03) : 591 - 609
  • [6] Formal Framework of XML Document Schema Design
    Zainol, Zurinahni
    Wang, Bing
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2012, 2 (01) : 21 - 64
  • [7] XML Document Classification Using Extended VSM
    Yang, Jianwu
    Zhang, Fudong
    [J]. FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 234 - 244
  • [8] Applications of semidefinite programming in XML document classification
    Xia, Zhonghang
    Xing, Guangming
    Qi, Houduo
    Li, Qi
    [J]. SURVEY OF TEXT MINING II: CLUSTERING, CLASSIFICATION, AND RETRIEVAL, 2008, : 129 - +
  • [9] Developing an XML framework for an electronic document delivery system
    Yu, SC
    Chen, RS
    [J]. ELECTRONIC LIBRARY, 2001, 19 (02): : 102 - 110
  • [10] Learning the kernel matrix for XML document clustering
    Yang, JW
    Cheung, WK
    Chen, X
    [J]. 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service, Proceedings, 2005, : 353 - 358