Automatic separation of words in multi-lingual multi-script Indian documents

被引：0

作者：

Pal, U

Chaudhuri, BB

机构：

来源：

PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2 | 1997年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In a multi-lingual country like India, a document may contain more than one script forms. For such a document it is necessary to separate different script forms before feeding them to OCRs of individual script. In this paper an automatic word segmentation approach is described which can separate Roman, Bangla and Devnagari scripts present in a single document. The approach has a tree structure where at first Roman script words are separated using the 'headline' feature. The headline is common in Bangla and Devnagari but absent in Roman. Next, Bangla and Devnagari words are separated using some finer characteristics of the character set although recognition of individual character is avoided. At present, the system has an overall accuracy of 96.09%.

引用

页码：576 / 579

页数：4

共 50 条

[41] Word-Level Thirteen Official Indic Languages Database for Script Identification in Multi-script Documents
Obaidullah, Sk Md
Santosh, K. C.
Halder, Chayan
Das, Nibaran
Roy, Kaushik
[J]. RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 16 - 27
[42] A Texture based approach to Word-level Script Identification from Multi-script Handwritten Documents
Singh, Pawan Kumar
Khan, Aparajita
Sarkar, Ram
Nasipuri, Mita
[J]. 2014 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMMUNICATION NETWORKS, 2014, : 228 - 232
[43] Multi-lingual Transformer Training for Khmer Automatic Speech Recognition
Soky, Kak
Li, Sheng
Kawahara, Tatsuya
Seng, Sopheap
[J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1893 - 1896
[44] A Low Resource Multi-lingual Simultaneous Script Identification and Text Recognition Model
Jayati Mukherjee
Utpal Roy
[J]. SN Computer Science, 5 (6)
[45] Automatic Focus Personage Identification in Multi-lingual News Image
Su, Xueping
Zhou, Hangchi
[J]. 2017 INTERNATIONAL CONFERENCE ON THE FRONTIERS AND ADVANCES IN DATA SCIENCE (FADS), 2017, : 74 - 79
[46] Automatic learning of numeral grammars for multi-lingual speech synthesizers
Flach, G
Holzapfel, M
Just, C
Wachtler, A
Wolff, M
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1291 - 1294
[47] Automatic identification of focus personage in multi-lingual news images
Su, Xueping
Zhu, Danyao
Ren, Jie
Raetsch, Matthias
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (07) : 11015 - 11030
[48] Automatic identification of focus personage in multi-lingual news images
Xueping Su
Danyao Zhu
Jie Ren
Matthias Rätsch
[J]. Multimedia Tools and Applications, 2021, 80 : 11015 - 11030
[49] Word level multi-script identification
Pati, Peeta Basa
Ramakrishnan, A. G.
[J]. PATTERN RECOGNITION LETTERS, 2008, 29 (09) : 1218 - 1229
[50] Firefighting in a multi-lingual world
Anon
[J]. Fire International, 2002, (194):

← 1 2 3 4 5 →