Clustering Homogeneous XML Documents Using Weighted Similarities on XML Attributes

被引：1

作者：

Nagwani, Naresh Kumar ^{[1
]}

Bhansali, Ashok ^{[2
]}

机构：

[1] NIT, Dept CS&E, Raipur, Madhya Pradesh, India

[2] OPJIT, Dept IT, Raigarh, India

来源：

2010 IEEE 2ND INTERNATIONAL ADVANCE COMPUTING CONFERENCE | 2010年

关键词：

XML Clustering; Weighted Similarity; XML Documents Similarity;

D O I：

10.1109/IADCC.2010.5422926

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

XML (eXtensible Markup Language) have been adopted by number of software vendors today, it became the standard for data interchange over the web and is platform and application independent also. A XML document is consists of number of attributes like document data, structure and style sheet etc. Clustering is method of creating groups of similar objects. In this paper a weighted similarity measurement approach for detecting the similarity between the homogeneous xml documents is suggested. Using this similarity measurement a new clustering technique is also proposed. The method of calculating similarity of document's structure and styling is given by number of researchers, mostly which are based on tree edit distances. And for calculating the distance between document's contents there are number of text and other similarity techniques like cosine, jaccord, tf-idf etc. In this paper both of the similarity techniques are combined to propose a new distance measurement technique for calculating the distance between a pair of homogeneous XML documents. The proposed clustering model is implemened using open source technology java and is validated experimentally. Given a collection of XML documents distances between documents is calculated and stored in the java collections, and then these distances are used to cluster the XML documents.

引用

页码：369 / +

页数：2

共 50 条

[1] XEdge: Clustering Homogeneous and Heterogeneous XML Documents Using Edge Summaries
Antonellis, Panagiotis
Makris, Christos
Tsirakis, Nikos
[J]. APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 1081 - 1088
[2] Clustering of XML documents
Guillaume, D
Murtagh, F
[J]. COMPUTER PHYSICS COMMUNICATIONS, 2000, 127 (2-3) : 215 - 227
[3] A weighted common structure based clustering technique for XML documents
Hwang, Jeong Hee
Ryu, Keun Ho
[J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2010, 83 (07) : 1267 - 1274
[4] Clustering XML Documents Using Frequent Subtrees
Kutty, Sangeetha
Tran, Tien
Nayak, Richi
Li, Yuefeng
[J]. ADVANCES IN FOCUSED RETRIEVAL, 2009, 5631 : 436 - 445
[5] Using structural similarity for clustering XML documents
Aitelhadj, Ali
Boughanem, Mohand
Mezghiche, Mohamed
Souam, Fatiha
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 32 (01) : 109 - 139
[6] Clustering XML documents using structural summaries
Dalamagas, T
Cheng, T
Winkel, KJ
Sellis, T
[J]. CURRENT TRENDS IN DATABASE TECHNOLOGY - EDBT 2004 WORKSHOPS, PROCEEDINGS, 2004, 3268 : 547 - 556
[7] Using structural similarity for clustering XML documents
Ali Aïtelhadj
Mohand Boughanem
Mohamed Mezghiche
Fatiha Souam
[J]. Knowledge and Information Systems, 2012, 32 : 109 - 139
[8] Clustering schemaless XML documents
Shen, Y
Wang, B
[J]. ON THE MOVE TO MEANINGFUL INTERNET SYSTEMS 2003: COOPIS, DOA, AND ODBASE, 2003, 2888 : 767 - 784
[9] Clustering XML documents by structure
Dalamagas, T
Cheng, T
Winkel, KJ
Sellis, T
[J]. METHODS AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3025 : 112 - 121
[10] Clustering XML Documents by Structure
Lesniewska, Anna
[J]. ADVANCES IN DATABASES AND INFORMATION SYSTEMS, 2010, 5968 : 238 - 246

← 1 2 3 4 5 →