Text and non-text separation in offline document images: a survey

被引:0
|
作者
Showmik Bhowmik
Ram Sarkar
Mita Nasipuri
David Doermann
机构
[1] Jadavpur University,Institute for Advanced Computer Studies
[2] University of Maryland,undefined
关键词
Text/non-text separation; Segmentation; Offline document images; Engineering drawing; Map; Unconstrained handwritten document; Newspaper; Journal; Magazine; Check; Form; Survey;
D O I
暂无
中图分类号
学科分类号
摘要
Separation of text and non-text is an essential processing step for any document analysis system. Therefore, it is important to have a clear understanding of the state-of-the-art of text/non-text separation in order to facilitate the development of efficient document processing systems. This paper first summarizes the technical challenges of performing text/non-text separation. It then categorizes offline document images into different classes according to the nature of the challenges one faces, in an attempt to provide insight into various techniques presented in the literature. The pros and cons of various techniques are explained wherever possible. Along with the evaluation protocols, benchmark databases, this paper also presents a performance comparison of different methods. Finally, this article highlights the future research challenges and directions in this domain.
引用
收藏
页码:1 / 20
页数:19
相关论文
共 50 条
  • [21] Separation of Text from Non-Text Doodles of Poet Rabindranath Tagore's Manuscripts
    Chaudhuri, B. B.
    Borah, Samarjeet
    Saraf, Ankita
    Goyal, Alisha
    Kumari, Alka
    2012 NATIONAL CONFERENCE ON COMPUTING AND COMMUNICATION SYSTEMS (NCCCS), 2012, : 165 - 169
  • [22] Automatic text block separation in document images
    Arvind, K. R.
    Pati, Peeta Basa
    Ramakrishnan, A. G.
    FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT SENSING AND INFORMATION PROCESSSING, PROCEEDINGS, 2006, : 53 - +
  • [23] Readability of Non-Text Images on the World Wide Web (WWW)
    Elahi, Ehsan
    Iglesias, Ana
    Morato, Jorge
    IEEE ACCESS, 2022, 10 : 116627 - 116634
  • [24] Boosting based text and non-text region classification
    Xie, Bingqing
    Agam, Gady
    DOCUMENT RECOGNITION AND RETRIEVAL XVIII, 2011, 7874
  • [25] Distinguishing Text/Non-Text Natural Images with Multi-Dimensional Recurrent Neural Networks
    Lyu, Pengyuan
    Shi, Baoguang
    Zhang, Chengquan
    Bai, Xiang
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 3981 - 3986
  • [26] Malayalam Text and Non-Text Classification of Natural Scene Images Based on Multiple Instance Learning
    Manjaly, Anit V.
    Priya, B. Shanmuga
    2016 IEEE INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER APPLICATIONS (ICACA), 2016, : 190 - 196
  • [27] Segmentation-Less Extraction of Text and Non-Text Regions From JPEG 2000 Compressed Document Images Through Partial and Intelligent Decompression
    Bisen, Tejasvee
    Javed, Mohammed
    Nagabhushan, P.
    Watanabe, Osamu
    IEEE ACCESS, 2023, 11 : 20673 - 20687
  • [28] TaNTISA: a hybrid approach for text/non-text classification and sentiment analysis of multimodal social media images
    Priyavrat Chauhan
    Nonita Sharma
    Geeta Sikka
    Sādhanā, 50 (1)
  • [29] Text segmentation by integrating hybrid strategy and non-text filtering
    Minhua Li
    Meng Bai
    Yingjun Lv
    Multimedia Tools and Applications, 2022, 81 : 44505 - 44522
  • [30] Text and Non-text Recognition using modified HOG descriptor
    Sah, Ankit Kumar
    Bhowmik, Showmik
    Malakar, Samir
    Sarkar, Ram
    Kavallieratou, Ergina
    Vasilopoulos, Nikos
    2017 IEEE CALCUTTA CONFERENCE (CALCON), 2017, : 64 - 68