Mining XML data: A clustering approach

被引:0
|
作者
Saraee, M [1 ]
Aljibouri, JM [1 ]
机构
[1] Univ Salford, Salford M5 4WT, Lancs, England
关键词
XML; data mining; Nearest Neighbour; XQuery;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML data has become very popular to represent semi structured data. This has resulted in a growing amount of XML data on the web. This raises a need for languages and tools to manage collections of XML documents as well as to mine interesting information from them. Several attempts at developing XML mining techniques have been proposed. However the topic of mining XML data has received little attention as the data mining community has focused on the development of techniques for extracting common structure from heterogeneous XML data. This project aims to data mine XML data using the XML Query language XQuery. The data mining technique used is the clustering technique of the Nearest Neighbour Algorithm. This algorithm will be incorporated into XQuery expression which, when implemented using an XQuery implementation tool, will cluster distance based data within the XML document into groups, where the distance between the data is set by a given threshold. The implementation of the Nearest Neighbour algorithm hopes to be generic and implement a user interface which allows the user to load a XML document for its data to be clustered, choose the data to be clustered within that document, input the threshold and receive the clustered result in an output file. This work would allow XML distance data to be clustered with the Nearest Neighbour algorithm using XQuery, therefore providing a needed data mining implementation on XML data.
引用
收藏
页码:283 / 288
页数:6
相关论文
共 50 条
  • [1] A Data Mining Approach to XML Dissemination
    Wang, Xiaoling
    Ester, Martin
    Qian, Weining
    Zhou, Aoying
    [J]. WEB INFORMATION SYSTEM ENGINEERING-WISE 2010, 2010, 6488 : 442 - +
  • [2] A new sequential mining approach to XML document clustering
    Hwang, JH
    Ryu, KH
    [J]. WEB TECHNOLOGIES RESEARCH AND DEVELOPMENT - APWEB 2005, 2005, 3399 : 266 - 276
  • [3] Clustering for data mining: A data recovery approach
    Leslie Rutkowski
    [J]. Psychometrika, 2007, 72 : 109 - 110
  • [4] Quality Data for Data Mining and Data Mining for Quality Data: A Constraint Based Approach in XML
    Shahriar, Md. Sumon
    Anam, Sarawat
    [J]. 2008 SECOND INTERNATIONAL CONFERENCE ON FUTURE GENERATION COMMUNICATION AND NETWORKING SYMPOSIA, VOLS 1-5, PROCEEDINGS, 2008, : 142 - +
  • [5] Clustering for data mining: A data recovery approach.
    Rutkowski, Leslie
    [J]. PSYCHOMETRIKA, 2007, 72 (01) : 109 - 110
  • [6] XML data mining
    Romei, Andrea
    Turini, Franco
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2010, 40 (02): : 101 - 130
  • [7] Space decomposition in data mining - a clustering approach
    Maimon, O
    Rokach, L
    Lavi, I
    [J]. 22ND CONVENTION OF ELECTRICAL AND ELECTRONICS ENGINEERS IN ISRAEL, PROCEEDINGS, 2002, : 101 - 104
  • [8] Space decomposition in data mining: A clustering approach
    Rokach, L
    Maimon, O
    Lavi, I
    [J]. FOUNDATIONS OF INTELLIGENT SYSTEMS, 2003, 2871 : 24 - 31
  • [9] A new data clustering approach for data mining in large databases
    Tsai, CF
    Wu, HC
    Tsai, CW
    [J]. I-SPAN'02: INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2002, : 315 - 320
  • [10] XML Data Clustering: An Overview
    Algergawy, Alsayed
    Mesiti, Marco
    Nayak, Richi
    Saake, Gunter
    [J]. ACM COMPUTING SURVEYS, 2011, 43 (04)