Indexing and querying segmented web pages: the BlockWeb Model

被引:3
|
作者
Bruno, Emmanuel [1 ]
Faessel, Nicolas [2 ]
Glotin, Herve [1 ]
Le Maitre, Jacques [1 ]
Scholl, Michel [3 ]
机构
[1] Univ Sud Toulon Var, LSIS, F-83957 La Garde, France
[2] Univ Paul Cezanne, LSIS, F-13397 Marseille 20, France
[3] CNAM, F-75141 Paris 03, France
来源
关键词
web page segmentation; block importance; block permeability; web image indexing; document indexing; document retrieval;
D O I
10.1007/s11280-011-0124-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b' in the same page if b' content (text, image, etc.) can be (partially) inherited by b upon indexing. An engine implementing this model is described including: the transformation of web pages into blocks hierarchies, the definition of a dedicated language to express indexing rules and the storage of indexed blocks into an XML repository. The model is assessed on a dataset of electronic news, and a dataset drawn from web pages of the ImagEval campaign where it improves by 16% the mean average precision of the baseline.
引用
收藏
页码:623 / 649
页数:27
相关论文
共 50 条
  • [31] SuMGra: Querying Multigraphs via Efficient Indexing
    Ingalalli, Vijay
    Ienco, Dino
    Poncelet, Pascal
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2016, PT I, 2016, 9827 : 387 - 401
  • [32] Wavelet-based video indexing and querying
    Xiaodong Wen
    Theodore D. Huffmire
    Helen H. Hu
    Adam Finkelstein
    [J]. Multimedia Systems, 1999, 7 : 350 - 358
  • [33] Indexing and Querying Moving Objects in Indoor Spaces
    Alamri, Sultan
    [J]. 2013 IEEE 29TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW), 2013, : 318 - 321
  • [34] Indexing and Querying the Compressed XML Data (IQCX)
    Senthilkumar, Radha
    Suganya, N.
    Kiruthika, I.
    Kannan, A.
    [J]. ADVANCES IN COMPUTING AND INFORMATION TECHNOLOGY, 2011, 198 : 497 - 506
  • [35] Wavelet-based video indexing and querying
    Wen, XD
    Huffmire, TD
    Hu, HH
    Finkelstein, A
    [J]. MULTIMEDIA SYSTEMS, 1999, 7 (05) : 350 - 358
  • [36] Massively Distributed Time Series Indexing and Querying
    Yagoubi, Djamel-Edine
    Akbarinia, Reza
    Masseglia, Florent
    Palpanas, Themis
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (01) : 108 - 120
  • [37] A fuzzy spatial querying model for improving apartment web services
    Yang, Huiqing Helen
    [J]. WMSCI 2007: 11TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL IV, PROCEEDINGS, 2007, : 197 - 202
  • [38] Effective Model And Implementation Of Dynamic Ranking In Web Pages
    Divjot
    Singh, Jaiteg
    [J]. 2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 1010 - 1014
  • [39] A natural language model of computing with words in web pages
    Zheng Ze-yu
    Zhang Ping
    [J]. PACLIC 20: PROCEEDINGS OF THE 20TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2006, : 341 - 346
  • [40] Visual querying for the semantic Web
    Berger, S
    Bry, F
    Wieser, C
    [J]. CONCEPTUAL MODELING - ER 2004, PROCEEDINGS, 2004, 3288 : 852 - 853