Separation of Text from Non-Text Doodles of Poet Rabindranath Tagore's Manuscripts

被引:0
|
作者
Chaudhuri, B. B. [1 ]
Borah, Samarjeet [1 ]
Saraf, Ankita [1 ]
Goyal, Alisha [1 ]
Kumari, Alka [1 ]
机构
[1] Indian Stat Inst, CVPR Unit, Kolkata 700108, India
关键词
Text; Non text Doodles; Rabindranath Tagore; Connected Components; pixels; Stroke Width; EXTRACTION; SEGMENTATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As gaining popularity of internet facilities have given a convenient and faster approach to mine a warehouse of both historical and contemporary handwritten documents; this has led to a continuous research and development in the field of information retrieval algorithm. In such handwritten documents, graphics and images are combined with text and often overlap one another. This paper presents a technique for separating textual data from non-textual information. The technique is based on some already published works. It is implemented in poet Rabindranath Tagore's manuscript. The approach generates connected components as basic primitive and tries to classify them as text or non-text based on a comparison between the total number of pixels and the number of boundary pixels constituting the component. A window is generated and further separation is done on the basis of the stroke width computed for each window. The paper also contains a brief review on some of the already published works.
引用
收藏
页码:165 / 169
页数:5
相关论文
共 50 条
  • [41] Text/Non-text Classification in Online Handwritten Documents with Conditional Random Fields
    Delaye, Adrien
    Liu, Cheng-Lin
    PATTERN RECOGNITION, 2012, 321 : 514 - 521
  • [42] Learning to Rank Domain Experts in Microblogging by Combining Text and Non-text Features
    Qi, Lu
    Huang, Yanyi
    Li, Lin
    Xu, Guandong
    PROCEEDINGS OF 2015 IEEE INTERNATIONAL CONFERENCE ON BEHAVIORAL, ECONOMIC, SOCIO-CULTURAL COMPUTING (BESC), 2015, : 28 - 31
  • [43] Distance Transform-Based Stroke Feature Descriptor for Text Non-text Classification
    Khan, Tauseef
    Mollah, Ayatullah Faruk
    RECENT DEVELOPMENTS IN MACHINE LEARNING AND DATA ANALYTICS, 2019, 740 : 189 - 200
  • [44] Multi-script text versus non-text classification of regions in scene images
    Sriman, Bowornrat
    Schomaker, Lambert
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 62 : 23 - 42
  • [45] Rabindranath Tagore's Theatre: From Page to Stage
    Sengupta, Ashis
    ASIATIC-IIUM JOURNAL OF ENGLISH LANGUAGE AND LITERATURE, 2022, 16 (01): : 190 - 194
  • [46] Text Line Identtification in Tagore's Manuscript
    Adak, Chandranath
    Chaudhuri, Bidyut B.
    2014 IEEE STUDENTS' TECHNOLOGY SYMPOSIUM (IEEE TECHSYM), 2014, : 210 - 213
  • [48] Classification of regions extracted from scene images by morphological filters in text or non-text using decision tree
    Luz Alves, Wonder Alexandre
    Hashimoto, Ronaldo Fumio
    WSCG 2010: FULL PAPERS PROCEEDINGS, 2010, : 165 - 172
  • [49] Investigation of the accessibility of non-text content published on websites
    Kous, Katja
    Kuhar, Sasa
    Rajsp, Alen
    Sumak, Bostjan
    2020 43RD INTERNATIONAL CONVENTION ON INFORMATION, COMMUNICATION AND ELECTRONIC TECHNOLOGY (MIPRO 2020), 2020, : 1645 - 1650
  • [50] Non-Parochial Inclusive Nationalism in Rabindranath Tagore's Gora
    Kundra, Nakul
    FORUM FOR WORLD LITERATURE STUDIES, 2019, 11 (04): : 680 - 698