Processing heterogeneous XML data from multi-source

被引:1
|
作者
Wang, Tong [1 ]
Liu, Da-Xin
Sun, Wei
Lin, Xuanzuo [1 ]
机构
[1] Northeast Agriculture Univ, Harbin, Peoples R China
关键词
heterogeneous data; multivalued dependency; XML; multi-source;
D O I
10.1117/12.666467
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently XML heterogeneity has become a new challenge. In this paper, a novel clustering strategy is proposed to regroup these heterogeneous XML sources, for searching in a relatively smaller space with certain similarity can reduce cost. The strategy consists of four steps. We at first extract features about paths and map them into High-dimension Vector Space (HDVS). In the data pre-process, two algorithms are applied to diminish the redundancies in XML sources. Then heterogeneous documents are clustered. Finally, Multivalued Dependency (MVD) is introduced, for MVD can be redefined according to the range of constraints of XML. This paper also proposes a novel algorithm that discovering minimal MVD, based on the rough set handling non-integrity data. It can solve the problem that non-integrity data of XML influence on finding the MVD of XML, thus patterns can be extracted from each cluster.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Approximate query approach based on ontology for multi-source and heterogeneous XML data
    School of Electronics and Information Engineering, Xi'an Jiaotong University, Xi'an 710049, China
    [J]. Hsi An Chiao Tung Ta Hsueh, 2007, 6 (702-706):
  • [2] Multi-source Heterogeneous Data Fusion
    Zhang, Lili
    Xie, Yuxiang
    Luan Xidao
    Zhang, Xin
    [J]. 2018 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD), 2018, : 47 - 51
  • [3] Similarity measure of multi-source XML data by means of data source-sensitivity
    [J]. Li, Shao-Bo, 1600, South China University of Technology (42):
  • [4] Research on the processing method of multi-source heterogeneous data in the intelligent agriculture cloud platform
    Gao, Weimin
    Zhong, Jiaming
    Liu, Yichen
    [J]. APPLIED MATHEMATICS AND NONLINEAR SCIENCES, 2022, 8 (01) : 2367 - 2376
  • [5] An Integration Model of Multi-Source Heterogeneous Audit Data
    Li Chunqiang
    Chai Weiyan
    Chen Linan
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON ELECTRONIC SCIENCE AND AUTOMATION CONTROL, 2015, 20 : 262 - 266
  • [6] SimbaQL: A Query Language for Multi-source Heterogeneous Data
    Li, Yuepeng
    Shen, Zhihong
    Li, Jianhui
    [J]. BIG SCIENTIFIC DATA MANAGEMENT, 2019, 11473 : 275 - 284
  • [7] Querying multi-source heterogeneous fuzzy spatiotemporal data
    Bai, Luyi
    Li, Nan
    Liu, Lishuang
    Hao, Xuesong
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (05) : 9843 - 9854
  • [8] Construction of a rural tourism information service management system for multi-source heterogeneous data processing
    Wu, Xuefei
    Huang, Jiahui
    [J]. PEERJ COMPUTER SCIENCE, 2023, 9
  • [9] Multi-source heterogeneous data storage methods for omnimedia data space
    Zhuo, Wenbo
    [J]. INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2024, 15 (3-4) : 314 - 322
  • [10] Learning from multi-source data
    Fromont, E
    Cordier, MO
    Quiniou, R
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2004, PROCEEDINGS, 2004, 3202 : 503 - 505