Text/non-text classification of connected components in document images

被引:3
|
作者
Julca-Aguilar, Frank D. [1 ]
Maia, Ana L. L. M. [1 ,2 ]
Hirata, Nina S. T. [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Stat, Dept Comp Sci, Sao Paulo, Brazil
[2] State Univ Feira de Santana UEFS, Dept Exact Sci, Feira De Santana, Brazil
基金
巴西圣保罗研究基金会;
关键词
SEGMENTATION;
D O I
10.1109/SIBGRAPI.2017.66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text segmentation is an important problem in document analysis related applications. We address the problem of classifying connected components of a document image as text or non-text. Inspired from previous works in the literature, besides common size and shape related features extracted from the components, we also consider component images, without and with context information, as inputs of the classifiers. Multi-layer perceptrons and convolutional neural networks are used to classify the components. High precision and recall is obtained with respect to both text and non-text components.
引用
收藏
页码:450 / 455
页数:6
相关论文
共 50 条
  • [21] Separation of Text and Non-text in Document Layout Analysis using a Recursive Filter
    Tuan-Anh Tran
    Na, In-Seop
    Kim, Soo-Hyung
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2015, 9 (10): : 4072 - 4091
  • [22] Text non-text classification based on area occupancy of equidistant pixels
    Khan, Tauseef
    Mollah, Ayatullah Faruk
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 1889 - 1900
  • [23] Video Text Binarization using Connected Component Level Non-text Filtering
    Cho, Beom Geun
    Kim, Shin Gon
    Koo, Hyung Il
    2018 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATION (ICEIC), 2018, : 493 - 494
  • [24] Text/non-text image classification in the wild with convolutional neural networks
    Bai, Xiang
    Shi, Baoguang
    Zhang, Chengquan
    Cai, Xuan
    Qi, Li
    PATTERN RECOGNITION, 2017, 66 : 437 - 446
  • [25] Classification of Text regions in a Document Image by Analyzing the properties of Connected Components
    Bhowmik, Showmik
    Sarkar, Ram
    PROCEEDINGS OF 2020 IEEE APPLIED SIGNAL PROCESSING CONFERENCE (ASPCON 2020), 2020, : 36 - 40
  • [26] Text Detection on Camera Acquired Document Images using Supervised Classification of Connected Components in Wavelet Domain
    Roy, Udit
    Harit, Gaurav
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 270 - 273
  • [27] Deep features based convolutional neural network model for text and non-text region segmentation from document images
    Umer, Saiyed
    Mondal, Ranjan
    Pandey, Hari Mohan
    Rout, Ranjeet Kumar
    APPLIED SOFT COMPUTING, 2021, 113
  • [28] Classification of regions extracted from scene images by morphological filters in text or non-text using decision tree
    Luz Alves, Wonder Alexandre
    Hashimoto, Ronaldo Fumio
    WSCG 2010: FULL PAPERS PROCEEDINGS, 2010, : 165 - 172
  • [29] Comparison of MRF and CRF for Text/Non-text Classification in Japanese Ink Documents
    Inatani, Soichiro
    Truyen Van Phan
    Nakagawa, Masaki
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 684 - 689
  • [30] Text/Non-Text Classification in Online Handwritten Documents with Recurrent Neural Networks
    Truyen Van Phan
    Nakagawa, Masaki
    2014 14TH INTERNATIONAL CONFERENCE ON FRONTIERS IN HANDWRITING RECOGNITION (ICFHR), 2014, : 23 - 28