Classifying XML documents based on Structure/Content similarity

被引:0
|
作者
Xing, Guangming [1 ]
Guo, Jinhua [2 ]
Xia, Zhonghang [1 ]
机构
[1] Western Kentucky Univ, Dept Comp Sci, Bowling Green, KY 42104 USA
[2] Univ Michigan, Comp & Informat Sci Dept, Dearborn, MI 48128 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present a framework for classifying XML documents based on structure/content similarity between XML documents. Firstly, an algorithm is proposed for computing the edit distance between an ordered labeled tree and a regular hedge grammar. The new edit distance gives a more precise measure for structural similarity than existing distance metrics in the literature. Secondly, we study schema extraction from XML documents, and an effective solution based on minimum length description (MLD) principle is given. Our schema extraction method allows trade off between schema simplicity and precision based on the user's specification. Thirdly, classification of XML documents is discussed. Representation of XML documents based on the structures and contents is also studied. The efficacy and efficiency of our methodology have been tested using the data sets from XML Mining Challenge.
引用
收藏
页码:444 / 457
页数:14
相关论文
共 50 条
  • [1] An improved method for classifying XML documents based on structure and content
    Zhang Na
    Zhang Dongzhan
    Yu Ye
    Duan Jiangjiao
    [J]. THIRD INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY (ISCSCT 2010), 2010, : 426 - 430
  • [2] Structure and Content Similarity for Clustering XML Documents
    Zhang, Lijun
    Li, Zhanhuai
    Chen, Qun
    Li, Ning
    [J]. WEB-AGE INFORMATION MANAGEMENT, 2010, 6185 : 116 - 124
  • [3] Clustering XML Documents Using Structure and Content based on a New Similarity Function OverallSimSUX
    Magdaleno, Damny
    Fuentes, Vett E.
    Garcia, Maria M.
    [J]. COMPUTACION Y SISTEMAS, 2015, 19 (01): : 151 - 161
  • [4] Similarity measurement of XML documents based on structure and contents
    Kim, Tae-Soon
    Lee, Ju-Hong
    Song, Jae-Won
    Kim, Deok-Hwan
    [J]. COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 902 - +
  • [5] Content and structure based approach for XML similarity
    Ma, YH
    Chbeir, R
    [J]. Fifth International Conference on Computer and Information Technology - Proceedings, 2005, : 136 - 140
  • [6] Clustering of XML Documents Based on Structure and Aggregated Content
    Rezk, Nermeen Gamal
    Sarhan, Amany
    Algergawy, Alsaved
    [J]. PROCEEDINGS OF 2016 11TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2016, : 93 - 102
  • [7] Classifying XML documents based on term semantics
    Zhang, Li-Jun
    Li, Zhan-Huai
    Chen, Qun
    Lou, Ying
    Li, Ning
    [J]. Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2012, 42 (06): : 1510 - 1514
  • [8] An implementation of XML documents search system based on similarity in structure and semantics
    Park, U
    Seo, Y
    [J]. INTERNATIONAL WORKSHOP ON CHALLENGES IN WEB INFORMATION RETRIEVAL AND INTEGRATION, PROCEEDINGS, 2005, : 97 - 102
  • [9] Similarity search for office XML documents based on style and structure data
    Watanabe, Yousuke
    Kamigaito, Hidetaka
    Yokota, Haruo
    [J]. INTERNATIONAL JOURNAL OF WEB INFORMATION SYSTEMS, 2013, 9 (02) : 100 - 116
  • [10] XCLSC: Structure and Content-based Clustering of XML Documents
    Bessine, Karima
    Nehar, Attia
    Cherroun, Hadda
    Moussaoui, Abdelouahab
    [J]. 2015 12TH IEEE INTERNATIONAL CONFERENCE ON PROGRAMMING AND SYSTEMS (ISPS), 2015, : 221 - 227