Text/non-text classification of connected components in document images

被引:3
|
作者
Julca-Aguilar, Frank D. [1 ]
Maia, Ana L. L. M. [1 ,2 ]
Hirata, Nina S. T. [1 ]
机构
[1] Univ Sao Paulo, Inst Math & Stat, Dept Comp Sci, Sao Paulo, Brazil
[2] State Univ Feira de Santana UEFS, Dept Exact Sci, Feira De Santana, Brazil
基金
巴西圣保罗研究基金会;
关键词
SEGMENTATION;
D O I
10.1109/SIBGRAPI.2017.66
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text segmentation is an important problem in document analysis related applications. We address the problem of classifying connected components of a document image as text or non-text. Inspired from previous works in the literature, besides common size and shape related features extracted from the components, we also consider component images, without and with context information, as inputs of the classifiers. Multi-layer perceptrons and convolutional neural networks are used to classify the components. High precision and recall is obtained with respect to both text and non-text components.
引用
收藏
页码:450 / 455
页数:6
相关论文
共 50 条
  • [31] Comparison of MRF and CRF for Text/Non-text Classification in Japanese Ink Documents
    Inatani, Soichiro
    Phan, Truyen Van
    Nakagawa, Masaki
    Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR, 2014, 2014-December : 684 - 689
  • [32] Text/Non-text Classification in Online Handwritten Documents with Conditional Random Fields
    Delaye, Adrien
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2012, 321 : 514 - 521
  • [33] A recurrent neural network based deep learning model for text and non-text stroke classification in online handwritten Devanagari document
    Ghosh, Rajib
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (17) : 24245 - 24263
  • [34] A recurrent neural network based deep learning model for text and non-text stroke classification in online handwritten Devanagari document
    Rajib Ghosh
    Multimedia Tools and Applications, 2022, 81 : 24245 - 24263
  • [35] Distance Transform-Based Stroke Feature Descriptor for Text Non-text Classification
    Khan, Tauseef
    Mollah, Ayatullah Faruk
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 189 - 200
  • [36] Readability of Non-Text Images on the World Wide Web (WWW)
    Elahi, Ehsan
    Iglesias, Ana
    Morato, Jorge
    IEEE ACCESS, 2022, 10 : 116627 - 116634
  • [37] Distinguishing Text/Non-Text Natural Images with Multi-Dimensional Recurrent Neural Networks
    Lyu, Pengyuan
    Shi, Baoguang
    Zhang, Chengquan
    Bai, Xiang
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3981 - 3986
  • [38] Combination of global and local contexts for text/non-text classification in heterogeneous online handwritten documents
    Truyen Van Phan
    Nakagawa, Masaki
    PATTERN RECOGNITION, 2016, 51 : 112 - 124
  • [39] Contextual text/non-text stroke classification in online handwritten notes with conditional random fields
    Delaye, Adrien
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2014, 47 (03) : 959 - 968
  • [40] Text and non-text image classification algorithm of computer design scene based on deep learning
    Lai, Shouliang
    Luo, Zihui
    Wang, Meiyan
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 63 - 63