A Multilingual Handwritten Character Dataset: T-H-E Dataset

被引:0
|
作者
Bartos, Gaye Ediboglu [1 ]
Hoscan, Yasar [1 ]
Kauer, Andras [2 ]
Hajnal, Eva [3 ]
机构
[1] Eskisehir Tech Univ, Dept Comp Engn, 2 Eylul Campus, TR-26555 Eskisehir, Turkey
[2] Szekesfehervari SzC Szechenyi Istvan Secondary Te, Budai Ut 45, H-8000 Szekesfehervar, Hungary
[3] Obuda Univ, Alba Regia Tech Fac, Budai Ut 45, H-8000 Szekesfehervar, Hungary
关键词
public dataset; handwritten character dataset; offline character recognition; OCR; multilingual; RECOGNITION; DATABASE;
D O I
暂无
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
The absence of handwritten special Latin character datasets prompted the creation of the T-H-E Dataset (Turkish-Hungarian-English handwritten character dataset) contributing to the recognition of multilingual handwritten texts. This paper represents a public-domain dataset including handwritten Turkish, Hungarian and English characters collected from 200 participants. The T-H-E Dataset is formed from 78 different letters represented in 156000 binary characters including both the upper and lower-case versions. The dataset can be downloaded from the web in six different versions enabling users to combine the different alphabets for different recognition purposes. The evaluation of the dataset is carried out by applying the same deep learning architecture on the T-H-E dataset and the EMNIST dataset. The dataset is publicly available at https://github.com/bartosgaye/thedataset.
引用
收藏
页码:141 / 160
页数:20
相关论文
共 50 条
  • [1] Offline Handwritten Telugu Character Dataset and Recognition
    Negi, Atul
    Rao, Anish M.
    [J]. 2019 IEEE 16TH INDIA COUNCIL INTERNATIONAL CONFERENCE (IEEE INDICON 2019), 2019,
  • [2] Benchmark Dataset for Offline Handwritten Character Recognition
    Yousaf, Adeel
    Khan, M. Jaleed
    Imran, M.
    Khurshid, Khurram
    [J]. 2017 13TH INTERNATIONAL CONFERENCE ON EMERGING TECHNOLOGIES (ICET 2017), 2017,
  • [3] CArDIS: A Swedish Historical Handwritten Character and Word Dataset
    Yavariabdi, Amir
    Kusetogullari, Huseyin
    Celik, Turgay
    Thummanapally, Shivani
    Rijwan, Sakib
    Hall, Johan
    [J]. IEEE ACCESS, 2022, 10 : 55338 - 55349
  • [4] GHCR-A dataset for Grantha handwritten character recognition
    Yohoshiva, Basaraboyina
    Challa, Nagendra Panini
    [J]. DATA IN BRIEF, 2024, 56
  • [5] Multilingual character recognition dataset for Moroccan official documents
    Benaissa, Ali
    Bahri, Abdelkhalak
    El Allaoui, Ahmad
    [J]. DATA IN BRIEF, 2024, 52
  • [6] Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition
    Inunganbi, Sanasam
    Choudhary, Prakash
    Manglem, Khumanthem
    [J]. VISUAL COMPUTER, 2021, 37 (02): : 291 - 305
  • [7] Dataset Generation for Gujarati Language Using Handwritten Character Images
    Suthar, Sanket B.
    Thakkar, Amit R.
    [J]. WIRELESS PERSONAL COMMUNICATIONS, 2024, 136 (04) : 2163 - 2184
  • [8] A Benchmark Gurmukhi Handwritten Character Dataset: Acquisition, Compilation, and Recognition
    Kaur, Kanwaljit
    Chaudhuri, Bidyut Baran
    Lehal, Gurpreet Singh
    [J]. FRONTIERS IN HANDWRITING RECOGNITION, ICFHR 2022, 2022, 13639 : 452 - 467
  • [9] Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition
    Sanasam Inunganbi
    Prakash Choudhary
    Khumanthem Manglem
    [J]. The Visual Computer, 2021, 37 : 291 - 305
  • [10] TROUTON AND T-H-E RULE
    NASH, LK
    [J]. JOURNAL OF CHEMICAL EDUCATION, 1984, 61 (11) : 981 - 984