Content-aware DataGuides: Interleaving IR and DB indexing techniques for efficient retrieval of textual XML data

被引:0
|
作者
Weigel, F [1 ]
Meuss, H
Bry, F
Schulz, KU
机构
[1] Univ Munich, Inst Comp Sci, D-80539 Munich, Germany
[2] European So Observ, D-8046 Garching, Germany
[3] Univ Munich, Ctr Informat & Language Proc, D-80539 Munich, Germany
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Not only since the advent of XML, many applications call for efficient structured document retrieval, challenging both Information Retrieval (IR) and database (DB) research. Most approaches combining indexing techniques from both fields still separate path and content matching, merging the hits in an expensive join. This paper shows that retrieval is significantly accelerated by processing text and structure simultaneously. The Content-Aware DataGuide (CADG) interleaves IR and DB indexing techniques to minimize path matching and suppress joins at query time, also saving needless I/O operations during retrieval. Extensive experiments prove the CADG to outperform the DataGuide [11,14] by a factor 5 to 200 on average. For structurally unselective queries, it is over 400 times faster than the DataGuide. The best results were achieved on large collections of heterogeneously structured textual documents.
引用
收藏
页码:378 / 393
页数:16
相关论文
共 8 条
  • [1] Structured content-aware discovery for improving XML data consistency
    Vo, Loan T. H.
    Cao, Jinli
    Rahayu, Wenny
    Hong-Quang Nguyen
    INFORMATION SCIENCES, 2013, 248 : 168 - 190
  • [2] Content-aware Partial Compression for Big Textual Data Analysis Acceleration
    Dong, Dapeng
    Herbert, John
    2014 IEEE 6TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2014, : 320 - 325
  • [3] Content-Aware Partial Compression for Textual Big Data Analysis in Hadoop
    Dong, Dapeng
    Herbert, John
    IEEE TRANSACTIONS ON BIG DATA, 2018, 4 (04) : 459 - 472
  • [4] Semantic-based Structural and Content indexing for the efficient retrieval of queries over large XML data repositories
    Alghamdi, Norah Saleh
    Rahayu, Wenny
    Pardede, Eric
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2014, 37 : 212 - 231
  • [5] A Content-aware Data-plane for Efficient and Scalable Video Delivery
    Desmouceaux, Yoann
    Enguehard, Marcel
    Nguyen, Victor
    Pfister, Pierre
    Shao, Wenqin
    Vyncke, Eric
    2019 IFIP/IEEE SYMPOSIUM ON INTEGRATED NETWORK AND SERVICE MANAGEMENT (IM), 2019, : 10 - 18
  • [6] Content-Aware Proportional Caching for Efficient Data Delivery over Satellite Network
    Zhang, Jiaran
    Yang, Yating
    Sang, Huanyu
    Gao, Zhuoqun
    Song, Tian
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 4890 - 4895
  • [7] Towards Efficient Content-aware Search over Encrypted Outsourced Data in Cloud
    Fu, Zhangjie
    Sun, Xingming
    Ji, Sai
    Xie, Guowu
    IEEE INFOCOM 2016 - THE 35TH ANNUAL IEEE INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATIONS, 2016,
  • [8] An indexing scheme for energy-efficient processing of content-based retrieval queries on a wireless data stream
    Chung, Yon Dohn
    INFORMATION SCIENCES, 2007, 177 (02) : 525 - 542