Identification and removal of advertisements from yellow page documents

被引:0
|
作者
Hashemi, RR [1 ]
Epperson, C [1 ]
Jones, S [1 ]
Jin, L [1 ]
Talburt, J [1 ]
机构
[1] Univ Arkansas, Dept Comp Sci, Little Rock, AR 72204 USA
关键词
OCR of yellow pages; identification of advertisements; hesitation; tracking; removal of advertisements;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
OCRing fails to deliver the information embaded in a yellow page document. Such failure stems from the fact that a yellow page document includes creative advertisements, multiple columns, and decorative graphics. In this research effort we introduce a set of algorithms that enables us to identify and remove advertisements from a scanned yellow page document, Removal of advertisements is a major step in paving the way for successful OCRing of the yellow pages. The scanned image is a gray scaled image with 256 gray levels and the resolution of 3300 x 4400. The experimental test shows 98% correct identification and removal of advertisements from the image.
引用
收藏
页码:94 / 100
页数:7
相关论文
共 50 条
  • [21] Mapping documents onto Web page ontology
    Mladenic, D
    Grobelnik, M
    WEB MINING: FROM WEB TO SEMANTIC WEB, 2004, 3209 : 77 - 96
  • [22] Page layout analyser for multilingual Indian documents
    Chaudhuri, AR
    Mandal, AK
    Chaudhuri, BB
    LANGUAGE ENGINEERING CONFERENCE, PROCEEDINGS, 2003, : 24 - 32
  • [23] ADJECTIVE IDENTIFICATION IN TELEVISION ADVERTISEMENTS
    Abd Rahim, Normaliza
    13TH INTERNATIONAL EDUCATIONAL TECHNOLOGY CONFERENCE, 2013, 103 : 86 - 94
  • [24] Categorizing XML documents based on page styles
    Lee, JW
    CONTENT COMPUTING, PROCEEDINGS, 2004, 3309 : 422 - 429
  • [25] Guide for the removal of metal fasteners from historical documents
    不详
    BOLETIN DEL ARCHIVO GENERAL DE LA NACION, 2024, (18):
  • [26] Removal of redundancy in documents retrieved from different resources
    Bourbakis, N
    Meng, W
    Wu, Z
    Salerno, J
    Borek, S
    TENTH IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1998, : 112 - 119
  • [27] Efficient removal of noisy borders from monochromatic documents
    Avila, BT
    Lins, RD
    IMAGE ANALYSIS AND RECOGNITION, PT 2, PROCEEDINGS, 2004, 3212 : 249 - 256
  • [28] Concept Identification from Single-Documents
    Luis Ochoa-Hernandez, Jose
    Barcelo-Valenzuela, Mario
    Sanchez-Smitz, Gerardo
    Torres-Peralta, Raquel
    TECHNOLOGIES AND INNOVATION (CITI 2018), 2018, 883 : 158 - 173
  • [29] Writer Identification in Music Score Documents without Staff-Line Removal
    Hati, Anirban Jyoti
    Roy, Partha Pratim
    Pal, Umapada
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 595 - 600
  • [30] Marked themes and thematic patterns in abstracts, advertisements and administrative documents
    Kong, KCC
    WORD-JOURNAL OF THE INTERNATIONAL LINGUISTIC ASSOCIATION, 2004, 55 (03): : 343 - 362