Automatic Multi-lingual Script Recognition Application

被引:0
|
作者
Abu-Ain, Waleed Abdel Karim [1 ]
Abdullah, Siti Norul Huda Sheikh [2 ]
Omar, Khairuddin [3 ]
Abd Rahman, Siti Zaharah [2 ]
机构
[1] King Abdulaziz Univ, Jeddah, Saudi Arabia
[2] Univ Kebangsaan Malaysia, Ctr Cyber Secur, Fac Informat Sci & Technol, Bangi, Malaysia
[3] Univ Kebangsaan Malaysia, Ctr Artificial Intelligence Technol, Fac Informat Sci & Technol, Bangi, Malaysia
来源
关键词
Automatic Multi-lingual Script Recognition (AMSR); feature extraction; statistical texture analysis; Grey-Level Co-occurrence Matrix (GLCM); Local Binary Pattern (LBP);
D O I
10.17576/gema-2018-1803-12
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
Document Image Analysis and Recognition (DIAR) technique is used to recognize text component and translate it into editable format. Scripts are a set of graphical representations used to express a particular writing system as well as subsets belonging to a particular writing system. The writing styles of more than one script family may then be adopted by one language, such as in the cases where the old Malay language (Jawi) adopts the Arabic script while the modern one adopts the Roman script. The seven major scripts used in this research are in handwritten style including Arabic, Devanagari, Hebrew, Thai, Greek, Cyrillic and Korean. Automatic Multi-lingual Script Recognition (AMSR) is one of the main challenges in DIAR domain. Currently, only few attempts have been made for automated script identification of off-line handwritten documents images. Most available AMSR applications only deal with printed documents and script types, and they neglect handwritten and multilingual documents. The objective of this study is to propose a multi-lingual AMSR framework. The research methodology consists of a proposed multilingual AMSR framework. The multilingual AMSR framework is tested on Multilingual-HW datasets, which contains more than seven international unconstraint handwritten scripts, using Grey-Level Co-occurrence Matrix and Local Binary Pattern. The average accuracy of both methods is about 97.01% and 85.29% respectively. This proposed multilingual AMSR is hoped to be beneficial to a group of community which requires automatic sorting multilingual documents. This research can also be extended to document forensic area or international relations agency to identify unknown native document.
引用
收藏
页码:203 / 221
页数:19
相关论文
共 50 条
  • [1] Automatic separation of words in multi-lingual multi-script Indian documents
    Pal, U
    Chaudhuri, BB
    [J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS 1 AND 2, 1997, : 576 - 579
  • [2] Multi-lingual Transformer Training for Khmer Automatic Speech Recognition
    Soky, Kak
    Li, Sheng
    Kawahara, Tatsuya
    Seng, Sopheap
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1893 - 1896
  • [3] A Low Resource Multi-lingual Simultaneous Script Identification and Text Recognition Model
    Jayati Mukherjee
    Utpal Roy
    [J]. SN Computer Science, 5 (6)
  • [4] Parliament Archives Used for Automatic Training of Multi-lingual Automatic Speech Recognition Systems
    Nouza, Jan
    Safarik, Radek
    [J]. TEXT, SPEECH, AND DIALOGUE, TSD 2017, 2017, 10415 : 174 - 182
  • [5] Dataset and Evaluation of Automatic Speech Recognition for Multi-lingual Intent Recognition on Social Robots
    Andriella, Antonio
    Ros, Raquel
    Ellinson, Yoav
    Gannot, Sharon
    Lemaignan, Severin
    [J]. PROCEEDINGS OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024, 2024, : 865 - 869
  • [6] A Concept of Multi-lingual Translation Application
    Li, Yang
    Fujimoto, Takayuki
    [J]. 2018 7TH INTERNATIONAL CONGRESS ON ADVANCED APPLIED INFORMATICS (IIAI-AAI 2018), 2018, : 929 - 931
  • [7] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
    Miguel A. Ferrer
    Abhijit Das
    Moises Diaz
    Aythami Morales
    Cristina Carmona-Duarte
    Umapada Pal
    [J]. Cognitive Computation, 2024, 16 (1) : 131 - 157
  • [8] MDIW-13: a New Multi-Lingual and Multi-Script Database and Benchmark for Script Identification
    Ferrer, Miguel A.
    Das, Abhijit
    Diaz, Moises
    Morales, Aythami
    Carmona-Duarte, Cristina
    Pal, Umapada
    [J]. COGNITIVE COMPUTATION, 2024, 16 (01) : 131 - 157
  • [9] Online Character Recognition in Multi-lingual Framework
    Vidya, V.
    Indhu, T. R.
    Bhadran, V. K.
    [J]. INTELLIGENT SYSTEMS TECHNOLOGIES AND APPLICATIONS, VOL 1, 2016, 384 : 153 - 162
  • [10] Multi-lingual fingerspelling recognition for handicapped kiosk
    Kindiroglu A.A.
    Yalcin H.
    Aran O.
    Hruz M.
    Campr P.
    Akarun L.
    Karpov A.
    [J]. Pattern Recognition and Image Analysis, 2011, 21 (3) : 402 - 406