Identification and removal of advertisements from yellow page documents

被引:0
|
作者
Hashemi, RR [1 ]
Epperson, C [1 ]
Jones, S [1 ]
Jin, L [1 ]
Talburt, J [1 ]
机构
[1] Univ Arkansas, Dept Comp Sci, Little Rock, AR 72204 USA
关键词
OCR of yellow pages; identification of advertisements; hesitation; tracking; removal of advertisements;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
OCRing fails to deliver the information embaded in a yellow page document. Such failure stems from the fact that a yellow page document includes creative advertisements, multiple columns, and decorative graphics. In this research effort we introduce a set of algorithms that enables us to identify and remove advertisements from a scanned yellow page document, Removal of advertisements is a major step in paving the way for successful OCRing of the yellow pages. The scanned image is a gray scaled image with 256 gray levels and the resolution of 3300 x 4400. The experimental test shows 98% correct identification and removal of advertisements from the image.
引用
收藏
页码:94 / 100
页数:7
相关论文
共 50 条
  • [41] FALSE IDENTIFICATION OF ADVERTISEMENTS IN RECOGNITION TESTS
    Lucas, D. B.
    Murphy, M. J.
    JOURNAL OF APPLIED PSYCHOLOGY, 1939, 23 (02) : 264 - 269
  • [42] Duplicate open page removal from video stream of book flipping
    Chakraborty, Dibyayan
    Roy, Partha Pratim
    Alvarez, Jose M.
    Pal, Umapada
    2013 FOURTH NATIONAL CONFERENCE ON COMPUTER VISION, PATTERN RECOGNITION, IMAGE PROCESSING AND GRAPHICS (NCVPRIPG), 2013,
  • [43] Identification of the Parallel Documents from Multilingual News Websites
    Myrzakhmetov, Bagdat
    Sultangazina, Aitolkyn
    Makazhanov, Aibek
    2016 IEEE 10TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES (AICT), 2016, : 197 - 201
  • [44] Yellow River flooding during the past two millennia from historical documents
    Li, Teng
    Li, Jinbao
    Zhang, David D.
    PROGRESS IN PHYSICAL GEOGRAPHY-EARTH AND ENVIRONMENT, 2020, 44 (05): : 661 - 678
  • [45] Identification of Japanese and English script from a single document page
    Chanda, S.
    Pal, U.
    Kimura, F.
    2007 CIT: 7TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2007, : 656 - +
  • [46] Underline removal on old documents
    Pinto, JRC
    Pina, P
    Bandeira, L
    Pimentel, L
    Ramalho, M
    IMAGE ANALYSIS AND RECOGNITION, PT 2, PROCEEDINGS, 2004, 3212 : 226 - 233
  • [47] A Deep Learning Model for Information Loss Prevention From Multi-Page Digital Documents
    Guha, Abhijit
    Samanta, Debabrata
    Banerjee, Amit
    Agarwal, Daksh
    IEEE ACCESS, 2021, 9 : 80451 - 80465
  • [48] Shading Removal of Illustrated Documents
    Oliveira, Daniel Marques
    Lins, Rafael Dueire
    Pereira e Silva, Gabriel de Franca
    IMAGE ANALYSIS AND RECOGNITION, 2013, 7950 : 308 - 317
  • [49] A Slant Removal Technique for Document Page
    Kavallieratou, Ergina
    DOCUMENT RECOGNITION AND RETRIEVAL XXI, 2014, 9021
  • [50] Soft and hard skills identification: insights from IT job advertisements in the CIS region
    Ternikov, Andrei
    PEERJ COMPUTER SCIENCE, 2022, 8