Text/non-text classification of connected components in document images

被引:3
|
作者
Julca-Aguilar, Frank D. [1 ]
Maia, Ana L. L. M. [1 ,2 ]
Hirata, Nina S. T. [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Stat, Dept Comp Sci, Sao Paulo, Brazil
[2] State Univ Feira de Santana UEFS, Dept Exact Sci, Feira De Santana, Brazil
基金
巴西圣保罗研究基金会;
关键词
SEGMENTATION;
D O I
10.1109/SIBGRAPI.2017.66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text segmentation is an important problem in document analysis related applications. We address the problem of classifying connected components of a document image as text or non-text. Inspired from previous works in the literature, besides common size and shape related features extracted from the components, we also consider component images, without and with context information, as inputs of the classifiers. Multi-layer perceptrons and convolutional neural networks are used to classify the components. High precision and recall is obtained with respect to both text and non-text components.
引用
收藏
页码:450 / 455
页数:6
相关论文
共 50 条
  • [1] Text and non-text separation in offline document images: a survey
    Showmik Bhowmik
    Ram Sarkar
    Mita Nasipuri
    David Doermann
    International Journal on Document Analysis and Recognition (IJDAR), 2018, 21 : 1 - 20
  • [2] Connected Operators for Non-text Object Segmentation in Grayscale Document Images
    Mysore, Sheshera
    Gupta, Manish Kumar
    Belhe, Swapnil
    PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2016, VOL 1, 2017, 459 : 399 - 407
  • [3] A Novel Method for Text and Non-Text Segmentation in Document Images
    Deivalakshmi, S.
    Palanisamy, P.
    Vishwanathan, Gayatri
    2013 INTERNATIONAL CONFERENCE ON COMMUNICATIONS AND SIGNAL PROCESSING (ICCSP), 2013, : 255 - 259
  • [4] Text and non-text separation in offline document images: a survey
    Bhowmik, Showmik
    Sarkar, Ram
    Nasipuri, Mita
    Doermann, David
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2018, 21 (1-2) : 1 - 20
  • [5] Fast Text vs. Non-text Classification of Images
    Kralicek, Jiri
    Matas, Jiri
    DOCUMENT ANALYSIS AND RECOGNITION, ICDAR 2021, PT IV, 2021, 12824 : 18 - 32
  • [6] Text and Non-Text Region Identification Using Texture and Connected Components
    Vidyarthi, Ankit
    Mittal, Namita
    Kansal, Ankita
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROPAGATION AND COMPUTER TECHNOLOGY (ICSPCT 2014), 2014, : 604 - 609
  • [7] Automatic Extraction of Text and Non-text Information Directly from Compressed Document Images
    Javed, Mohammed
    Nagabhushan, P.
    Chaudhuri, Bidyut B.
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 38 - 46
  • [8] User interface for text and non-text classification
    Thanh Thi Xuan Lam
    Anh Duc Le
    Nakagawa, Masaki
    2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDAR 2019 WORKSHOP) AND 2ND INTERNATIONAL WORKSHOP ON HUMAN-DOCUMENT INTERACTION, VOL 3, 2019, : 1 - 5
  • [9] A Chinese Document Layout Analysis Based on Non-text Images
    Fu Xiaoling
    Li Xiaofeng
    2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 326 - 328
  • [10] Automatic Discrimination of Text and Non-Text Natural Images
    Zhang, Chengquan
    Yao, Cong
    Shi, Baoguang
    Bai, Xiang
    2015 13TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), 2015, : 886 - 890