PWDB_13: A Corpus of Word-Level Printed Document Images from Thirteen Official Indic Scripts

被引:2
|
作者
Obaidullah, Sk Md [1 ]
Halder, Chayan [2 ]
Das, Nibaran [3 ]
Roy, Kaushik [2 ]
机构
[1] Aliah Univ, Dept Comp Sci & Engn, Kolkata, W Bengal, India
[2] West Bengal State Univ, Dept Comp Sci, Kolkata, W Bengal, India
[3] Jadavpur Univ, Dept Comp Sci & Engn, Kolkata, W Bengal, India
关键词
Document image corpus; Printed script identification; W-R hybrid transform; MLP classifier; Benchmark result; IDENTIFICATION;
D O I
10.1007/978-81-322-2695-6_21
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present PWDB_13, a Word-level printed document image corpus from thirteen official Indic scripts, which consists of 26,000 words with equal distribution of each of the thirteen script types, collected by an automated process. A realistic classification framework based on four major regions of India has been proposed which represent the work as a unique one. Benchmarking is done with respect to PSI or printed script identification problem as it is very relevant in multi-script scenario. The result is said to be impressive observing the volume of the corpus and intrinsic complexities of Indic scripts. PWDB_13 will bridge the gap of unavailability of a complete document image dataset on all official Indic scripts and freely available to the researchers for noncommercial use.
引用
收藏
页码:233 / 242
页数:10
相关论文
共 3 条
  • [1] A Corpus of Word-Level Offline Handwritten Numeral Images from Official Indic Scripts
    Obaidullah, Sk Md
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 1, 2016, 379 : 703 - 711
  • [2] Word-Level Thirteen Official Indic Languages Database for Script Identification in Multi-script Documents
    Obaidullah, Sk Md
    Santosh, K. C.
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    RECENT TRENDS IN IMAGE PROCESSING AND PATTERN RECOGNITION (RTIP2R 2016), 2017, 709 : 16 - 27
  • [3] A new dataset of word-level offline handwritten numeral images from four official Indic scripts and its benchmarking using image transform fusion
    Obaidullah, Sk Md
    Halder, Chayan
    Das, Nibaran
    Roy, Kaushik
    INTERNATIONAL JOURNAL OF INTELLIGENT ENGINEERING INFORMATICS, 2016, 4 (01) : 1 - 20