Deep Learning Based Tangut Character Recognition

被引:0
|
作者
Zhang, Guangwei [1 ]
Han, Xiaomang [1 ]
机构
[1] Shaanxi Normal Univ, Sch Hist & Civilizat, Xian, Shaanxi, Peoples R China
关键词
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Tangut script, a logographic writing system, was used for writing the extinct Tangut language of the West Xia Dynasty. The huge amount of Tangut historical documents are being mainly recognized by Tangut experts manually, because the Tangut language has not been used since 16th century and it was impossible to recognized automatically in the past. With the help of deep learning, we build an end-to-end Tangut character recognition system to reduce the labor of Tangut experts. The high accuracy of a deep learning system for character recognition is essentially guaranteed by a large training dataset of well-labeled data. We construct a training dataset containing more than 100,000 labeled Tangut images, which is used for training a deep convolutional neural network (DCNN) to recognize Tangut characters. The Tangut images in the training dataset are from Tangut historical documents and they are labeled in a cluster-and- label way to reduce the human efforts. Based on the training dataset, the validation accuracy of the DCNN is more than 94% according to our experiments. We will release the training dataset for further study and construct an OCR system for transcribing Tangut historical documents automatically in the future.
引用
收藏
页码:437 / 441
页数:5
相关论文
共 50 条
  • [41] DeepNetDevanagari: a deep learning model for Devanagari ancient character recognition
    Sonika Rani Narang
    Munish Kumar
    M. K. Jindal
    [J]. Multimedia Tools and Applications, 2021, 80 : 20671 - 20686
  • [42] Kurdish Handwritten character recognition using deep learning techniques
    Ahmed, Rebin M.
    Rashid, Tarik A.
    Fattah, Polla
    Alsadoon, Abeer
    Bacanin, Nebojsa
    Mirjalili, Seyedali
    Vimal, S.
    Chhabra, Amit
    [J]. GENE EXPRESSION PATTERNS, 2022, 46
  • [43] Optical Character Recognition for Medical Records Digitization with Deep Learning
    Zaryab, Muhammad Ateeque
    Ng, Chuen Rue
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 3260 - 3263
  • [44] Deep Learning and Lexical Analysis Combined Rubbing Character Recognition
    Zhang, Zhiyu
    Wang, Zhichen
    Tomiyama, Hiroyuki
    Meng, Lin
    [J]. 2019 INTERNATIONAL CONFERENCE ON ADVANCED MECHATRONIC SYSTEMS (ICAMECHS), 2019, : 57 - 62
  • [45] Handwritten Tifinagh Character Recognition using Deep Learning Architectures
    Sadouk, Lamyaa
    Gadi, Taoufiq
    Essoufi, El Hassan
    [J]. PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON INTERNET OF THINGS AND MACHINE LEARNING (IML'17), 2017,
  • [46] Optical Character Recognition using Deep Learning: An enhanced Approach
    Amara, Marwa
    Zaghdoud, Radhia
    [J]. INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (05): : 545 - 552
  • [47] DeepNetDevanagari: a deep learning model for Devanagari ancient character recognition
    Narang, Sonika Rani
    Kumar, Munish
    Jindal, M. K.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (13) : 20671 - 20686
  • [48] Deep Learning-based Arabic Optical Character Recognition: A New Comprehensive Dataset at Character and Word Levels.
    Gaashan, Khulood
    Younes, Maram Bani
    [J]. 2024 15TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS, ICICS 2024, 2024,
  • [49] Water level recognition based on deep learning and character interpolation strategy for stained water gauge
    Wang, Xiaolong
    Li, Zhong
    Zhang, Yanwei
    An, Guocheng
    [J]. River, 2023, 2 (04): : 506 - 517
  • [50] A Novel Framework for Container Code-Character Recognition Based on Deep Learning and Template Matching
    Mei, Langqi
    Guo, Jianming
    Liu, Qing
    Lu, Pingping
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2016, : 78 - 82