Clustering Homogeneous XML Documents Using Weighted Similarities on XML Attributes

被引:1
|
作者
Nagwani, Naresh Kumar [1 ]
Bhansali, Ashok [2 ]
机构
[1] NIT, Dept CS&E, Raipur, Madhya Pradesh, India
[2] OPJIT, Dept IT, Raigarh, India
关键词
XML Clustering; Weighted Similarity; XML Documents Similarity;
D O I
10.1109/IADCC.2010.5422926
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
XML (eXtensible Markup Language) have been adopted by number of software vendors today, it became the standard for data interchange over the web and is platform and application independent also. A XML document is consists of number of attributes like document data, structure and style sheet etc. Clustering is method of creating groups of similar objects. In this paper a weighted similarity measurement approach for detecting the similarity between the homogeneous xml documents is suggested. Using this similarity measurement a new clustering technique is also proposed. The method of calculating similarity of document's structure and styling is given by number of researchers, mostly which are based on tree edit distances. And for calculating the distance between document's contents there are number of text and other similarity techniques like cosine, jaccord, tf-idf etc. In this paper both of the similarity techniques are combined to propose a new distance measurement technique for calculating the distance between a pair of homogeneous XML documents. The proposed clustering model is implemened using open source technology java and is validated experimentally. Given a collection of XML documents distances between documents is calculated and stored in the java collections, and then these distances are used to cluster the XML documents.
引用
收藏
页码:369 / +
页数:2
相关论文
共 50 条
  • [1] XEdge: Clustering Homogeneous and Heterogeneous XML Documents Using Edge Summaries
    Antonellis, Panagiotis
    Makris, Christos
    Tsirakis, Nikos
    [J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1081 - 1088
  • [2] Clustering of XML documents
    Guillaume, D
    Murtagh, F
    [J]. COMPUTER PHYSICS COMMUNICATIONS, 2000, 127 (2-3) : 215 - 227
  • [3] A weighted common structure based clustering technique for XML documents
    Hwang, Jeong Hee
    Ryu, Keun Ho
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (07) : 1267 - 1274
  • [4] Clustering XML Documents Using Frequent Subtrees
    Kutty, Sangeetha
    Tran, Tien
    Nayak, Richi
    Li, Yuefeng
    [J]. ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 436 - 445
  • [5] Using structural similarity for clustering XML documents
    Aitelhadj, Ali
    Boughanem, Mohand
    Mezghiche, Mohamed
    Souam, Fatiha
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
  • [6] Clustering XML documents using structural summaries
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    [J]. CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 547 - 556
  • [7] Using structural similarity for clustering XML documents
    Ali Aïtelhadj
    Mohand Boughanem
    Mohamed Mezghiche
    Fatiha Souam
    [J]. Knowledge and Information Systems, 2012, 32 : 109 - 139
  • [8] Clustering schemaless XML documents
    Shen, Y
    Wang, B
    [J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2003: COOPIS, DOA, AND ODBASE, 2003, 2888 : 767 - 784
  • [9] Clustering XML documents by structure
    Dalamagas, T
    Cheng, T
    Winkel, KJ
    Sellis, T
    [J]. METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3025 : 112 - 121
  • [10] Clustering XML Documents by Structure
    Lesniewska, Anna
    [J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2010, 5968 : 238 - 246