BlockWeb: an IR Model for Block Structured Web Pages

被引:0
|
作者
Bruno, Emmanuel [1 ]
Faessel, Nicolas [2 ]
Le Maitre, Jacques [1 ]
Scholl, Michel [3 ]
机构
[1] Univ Sud Toulon Var, CNRS, LSIS UMR 6168, BP 20132, F-83957 La Garde, France
[2] Univ Paul Cezanne, CNRS, LSIS UMR 6168, F-13397 Marseille, France
[3] CNAM, Cedric Wisdom, F-75141 Paris, France
关键词
D O I
10.1109/CBMI.2009.36
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
BlockWeb is a model that we have developed for indexing and querying web pages according to their content as well as to their visual rendering. These pages are split up into blocks what has several advantages in terms of page indexing and querying: (i) blocks of a page most similar to a query may be returned instead of the page as a whole (ii) the importance of a block can be taken into account, as well as (iii) the permeability of the blocks to the content of neighbor blocks. In this paper, we present the BlockWeb model and show its interest for indexing images of Web pages, through an experiment performed on electronic versions of French daily newspapers. We also present the engine we have implemented for block extraction, indexing and querying according to the BlockWeb model.
引用
收藏
页码:219 / +
页数:2
相关论文
共 50 条
  • [1] Indexing and querying segmented web pages: the BlockWeb Model
    Bruno, Emmanuel
    Faessel, Nicolas
    Glotin, Herve
    Le Maitre, Jacques
    Scholl, Michel
    [J]. WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2011, 14 (5-6): : 623 - 649
  • [2] Indexing and querying segmented web pages: the BlockWeb Model
    Emmanuel Bruno
    Nicolas Faessel
    Hervé Glotin
    Jacques Le Maitre
    Michel Scholl
    [J]. World Wide Web, 2011, 14 : 623 - 649
  • [3] Indexing by Permeability in Block Structured Web Pages
    Bruno, Emmanuel
    Faessel, Nicolas
    Glotin, Herve
    Le Maitre, Jacques
    Scholl, Michel
    [J]. DOCENG'09: PROCEEDINGS OF THE 2009 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, 2009, : 70 - 73
  • [4] Block Clustering for Web Pages Categorization
    Charrad, Malika
    Lechevallier, Yves
    ben Ahmed, Mohamed
    Saporta, Gilbert
    [J]. INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING, PROCEEDINGS, 2009, 5788 : 260 - +
  • [5] Automatic template detection for structured web pages
    Lo, Lawrence
    Ng, Vincent To-Yee
    Ng, Patrick
    Chan, Stephen C. F.
    [J]. 2006 10TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, PROCEEDINGS, VOLS 1 AND 2, 2006, : 708 - 713
  • [6] Generative Colorization of Structured Mobile Web Pages
    Kikuchi, Kotaro
    Inoue, Naoto
    Otani, Mayu
    Simo-Serra, Edgar
    Yamaguchi, Kota
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3639 - 3648
  • [7] Micro Genre: Building Block of Web Pages
    Kudelka, Milos
    Snasel, Vaclav
    Horak, Zdenek
    Abraham, Ajith
    [J]. NDT: 2009 FIRST INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 2009,
  • [8] Extracting Structured Data from Web Pages with Maximum Entropy Segmental Markov Model
    Mengel, Susan
    Jing, Yaoquin
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2009, PROCEEDINGS, 2009, 5802 : 219 - 226
  • [9] Analysing Structured Scholarly Data Embedded in Web Pages
    Sahoo, Pracheta
    Gadiraju, Ujwal
    Yu, Ran
    Saha, Sriparna
    Dietze, Stefan
    [J]. SEMANTICS, ANALYTICS, VISUALIZATION: ENHANCING SCHOLARLY DATA, SAVE-SD 2016, 2016, 9792 : 90 - 100
  • [10] Tree-structured template generation for web pages
    Chuang, SL
    Hsu, JYJ
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 327 - +