Indexing and querying segmented web pages: the BlockWeb Model

被引:0
|
作者
Emmanuel Bruno
Nicolas Faessel
Hervé Glotin
Jacques Le Maitre
Michel Scholl
机构
[1] Université du Sud Toulon-Var,LSIS
[2] Université Paul Cézanne,LSIS
[3] CNAM,Cedric/Wisdom
来源
World Wide Web | 2011年 / 14卷
关键词
web page segmentation; block importance; block permeability; web image indexing; document indexing; document retrieval; H.3.1; H.3.3;
D O I
暂无
中图分类号
学科分类号
摘要
We present in this paper a model for indexing and querying web pages, based on the hierarchical decomposition of pages into blocks. Splitting up a page into blocks has several advantages in terms of page design, indexing and querying such as (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to neighbor blocks: a block b is said to be permeable to a block b′ in the same page if b′ content (text, image, etc.) can be (partially) inherited by b upon indexing. An engine implementing this model is described including: the transformation of web pages into blocks hierarchies, the definition of a dedicated language to express indexing rules and the storage of indexed blocks into an XML repository. The model is assessed on a dataset of electronic news, and a dataset drawn from web pages of the ImagEval campaign where it improves by 16% the mean average precision of the baseline.
引用
收藏
页码:623 / 649
页数:26
相关论文
共 50 条
  • [1] Indexing and querying segmented web pages: the BlockWeb Model
    Bruno, Emmanuel
    Faessel, Nicolas
    Glotin, Herve
    Le Maitre, Jacques
    Scholl, Michel
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2011, 14 (5-6): : 623 - 649
  • [2] BlockWeb: an IR Model for Block Structured Web Pages
    Bruno, Emmanuel
    Faessel, Nicolas
    Le Maitre, Jacques
    Scholl, Michel
    [J]. CBMI: 2009 INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING, 2009, : 219 - +
  • [3] Querying Web pages with lattice expressions
    Hsu, PY
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1999, E82D (01) : 156 - 164
  • [4] Indexing Temporal Information for Web Pages
    Jin, Peiquan
    Chen, Hong
    Zhao, Xujian
    Li, Xiaowen
    Yue, Lihua
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2011, 8 (03) : 711 - 737
  • [5] A method for indexing web pages using web bots
    Szymanski, BK
    Chung, MS
    [J]. 2001 INTERNATIONAL CONFERENCES ON INFO-TECH AND INFO-NET PROCEEDINGS, CONFERENCE A-G: INFO-TECH & INFO-NET: A KEY TO BETTER LIFE, 2001, : C1 - C6
  • [6] Indexing by Permeability in Block Structured Web Pages
    Bruno, Emmanuel
    Faessel, Nicolas
    Glotin, Herve
    Le Maitre, Jacques
    Scholl, Michel
    [J]. DOCENG'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2009, : 70 - 73
  • [7] Querying and clustering web pages about persons and organizations
    Ye, SR
    Chua, TS
    Kei, JR
    [J]. IEEE/WIC INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, PROCEEDINGS, 2003, : 344 - 350
  • [8] Ontology based text indexing and querying for the semantic web
    Koehler, Jacob
    Philippi, Stephan
    Specht, Michael
    Rueegg, Alexander
    [J]. KNOWLEDGE-BASED SYSTEMS, 2006, 19 (08) : 744 - 754
  • [9] Structural and Semantic Indexing for Supporting Creation of Multilingual Web Pages
    Urae, Hiroshi
    Tezuka, Taro
    Kimura, Fuminori
    Maeda, Akira
    [J]. INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, IMECS 2012, VOL I, 2012, : 662 - 667
  • [10] A proposed multi criteria indexing and ranking model for documents and web pages on large scale data
    Attia, Mohamed
    Abdel-Fattah, Manal A.
    Khedr, Ayman E.
    [J]. JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (10) : 8702 - 8715