Preparations for semantics-based XML mining

被引:26
|
作者
Lee, JW
Lee, K
Kim, W
机构
关键词
D O I
10.1109/ICDM.2001.989538
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
XML allows users to define elements using arbitrary words and organize them in a nested structure. These features of XML offer both challenges and opportunities in information retrieval, document management, and data mining. In this paper, we propose a new methodology for preparing XML documents for quantitative determination of similarity between XML documents by taking account of XML semantics (i.e., meanings of the elements and nested structures of XML documents). Accurate quantitative determination of similarity between XML documents provides an important basis for a variety of applications of XML document mining and processing. Experiments with XML documents show that our methodology provides a 50-100% improvement in determining similarity, over the traditional vector-space model that considers only term-frequency and 100% accuracy in identifying the category of each document from an on-line bookstore.
引用
收藏
页码:345 / 352
页数:8
相关论文
共 50 条
  • [1] Schema-less, semantics-based change detection for XML documents
    Zhang, SH
    Dyreson, C
    Snodgrass, RT
    [J]. WEB INFORMATION SYSTEMS - WISE 2004, PROCEEDINGS, 2004, 3306 : 279 - 290
  • [2] SeCCX: Semantics-Based Fine Granular Concurrency Control for XML Data
    Rong, Chuitian
    Lu, Wei
    Zhang, Xiao
    Liu, Zhen
    Du, Xiaoyong
    [J]. WEB-AGE INFORMATION MANAGEMENT, 2010, 6185 : 146 - 155
  • [3] Semantics-based event log aggregation for process mining and analytics
    Amit V. Deokar
    Jie Tao
    [J]. Information Systems Frontiers, 2015, 17 : 1209 - 1226
  • [4] Semantics-based event log aggregation for process mining and analytics
    Deokar, Amit V.
    Tao, Jie
    [J]. INFORMATION SYSTEMS FRONTIERS, 2015, 17 (06) : 1209 - 1226
  • [5] From Structure-Based to Semantics-Based: Towards Effective XML Keyword Search
    Thuy Ngoc Le
    Wu, Huayu
    Ling, Tok Wang
    Li, Luochen
    Lu, Jiaheng
    [J]. CONCEPTUAL MODELING, ER 2013, 2013, 8217 : 356 - +
  • [6] A Query System for XML Data Stream and its Semantics-based Buffer Reduction
    Yang, Chi
    Liu, Chengfei
    Li, Jianxin
    Yu, Jeffrey Xu
    Wang, Junhu
    [J]. JOURNAL OF RESEARCH AND PRACTICE IN INFORMATION TECHNOLOGY, 2010, 42 (02): : 111 - 128
  • [7] A semantics-based consultations workbench
    Vassilakis, C
    Gouscos, D
    Georgiadis, P
    [J]. Enabling Technologies for the New Knowledge Society, 2005, : 421 - 434
  • [8] SEMANTICS-BASED PROGRAM INTEGRATION
    REPS, T
    HORWITZ, S
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1988, 300 : 1 - 20
  • [9] Semantics-Based Information Valuation
    Al-Saffar, Sinan
    Heileman, Gregory L.
    [J]. 2008 4TH INTERNATIONAL IEEE CONFERENCE INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2008, : 284 - 291
  • [10] Semantics-Based Code Search
    Reiss, Steven P.
    [J]. 2009 31ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2009, : 243 - 253