Indexing and querying segmented web pages: the BlockWeb Model

被引:3
|
作者
Bruno, Emmanuel [1 ]
Faessel, Nicolas [2 ]
Glotin, Herve [1 ]
Le Maitre, Jacques [1 ]
Scholl, Michel [3 ]
机构
[1] Univ Sud Toulon Var, LSIS, F-83957 La Garde, France
[2] Univ Paul Cezanne, LSIS, F-13397 Marseille 20, France
[3] CNAM, F-75141 Paris 03, France
来源
关键词
web page segmentation; block importance; block permeability; web image indexing; document indexing; document retrieval;
D O I
10.1007/s11280-011-0124-6
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b' in the same page if b' content (text, image, etc.) can be (partially) inherited by b upon indexing. An engine implementing this model is described including: the transformation of web pages into blocks hierarchies, the definition of a dedicated language to express indexing rules and the storage of indexed blocks into an XML repository. The model is assessed on a dataset of electronic news, and a dataset drawn from web pages of the ImagEval campaign where it improves by 16% the mean average precision of the baseline.
引用
收藏
页码:623 / 649
页数:27
相关论文
共 50 条
  • [1] Indexing and querying segmented web pages: the BlockWeb Model
    Emmanuel Bruno
    Nicolas Faessel
    Hervé Glotin
    Jacques Le Maitre
    Michel Scholl
    [J]. World Wide Web, 2011, 14 : 623 - 649
  • [2] BlockWeb: an IR Model for Block Structured Web Pages
    Bruno, Emmanuel
    Faessel, Nicolas
    Le Maitre, Jacques
    Scholl, Michel
    [J]. CBMI: 2009 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2009, : 219 - +
  • [3] Querying Web pages with lattice expressions
    Hsu, PY
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1999, E82D (01) : 156 - 164
  • [4] Indexing Temporal Information for Web Pages
    Jin, Peiquan
    Chen, Hong
    Zhao, Xujian
    Li, Xiaowen
    Yue, Lihua
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2011, 8 (03) : 711 - 737
  • [5] A method for indexing web pages using web bots
    Szymanski, BK
    Chung, MS
    [J]. 2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : C1 - C6
  • [6] Indexing by Permeability in Block Structured Web Pages
    Bruno, Emmanuel
    Faessel, Nicolas
    Glotin, Herve
    Le Maitre, Jacques
    Scholl, Michel
    [J]. DOCENG'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2009, : 70 - 73
  • [7] Querying and clustering web pages about persons and organizations
    Ye, SR
    Chua, TS
    Kei, JR
    [J]. IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 344 - 350
  • [8] Ontology based text indexing and querying for the semantic web
    Koehler, Jacob
    Philippi, Stephan
    Specht, Michael
    Rueegg, Alexander
    [J]. KNOWLEDGE-BASED SYSTEMS, 2006, 19 (08) : 744 - 754
  • [9] Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages
    Urae, Hiroshi
    Tezuka, Taro
    Kimura, Fuminori
    Maeda, Akira
    [J]. INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 662 - 667
  • [10] A proposed multi criteria indexing and ranking model for documents and web pages on large scale data
    Attia, Mohamed
    Abdel-Fattah, Manal A.
    Khedr, Ayman E.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (10) : 8702 - 8715