Character Detection and Segmentation of Historical Uchen Tibetan Documents in Complex Situations

被引:5
|
作者
Zhang, Ce [1 ,2 ]
Wang, Weilan [1 ]
Liu, Huaming [3 ]
Zhang, Guowei [1 ]
Lin, Qiang [4 ]
机构
[1] Northwest Minzu Univ, Key Lab Chinas Ethn Languages & Informat Technol, Minist Educ, Lanzhou 730030, Peoples R China
[2] Chongqing Univ Educ, Sch Artificial Intelligence, Chongqing 400065, Peoples R China
[3] Fuyang Normal Univ, Sch Comp & Informat Engn, Fuyang 236037, Peoples R China
[4] Northwest Minzu Univ, Key Lab Streaming Data Comp & Applicat, Lanzhou 730124, Peoples R China
基金
中国国家自然科学基金;
关键词
Image segmentation; Feature extraction; Printing; Character recognition; Licenses; Databases; Writing; Historical Tibetan documents; local baseline detection; character detection; character segmentation; stroke attribution; TEXT-LINE SEGMENTATION; RECOGNITION;
D O I
10.1109/ACCESS.2022.3151886
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Tibetan is a low-resource language, and Tibetan culture carried by historical Tibetan documents is an important part of Chinese civilization. The study of historical Tibetan documents is of great significance to the protection of Tibetan culture and the promotion of Chinese culture. Character segmentation is an important step in image analysis and recognition of historical Tibetan documents. However, the following three challenges prevent solving problems of character segmentation in historical Tibetan documents: 1) the text lines have different degrees of tilt and twist; 2) there are many complex situations such as overlapping, crossing, touching and breaking character strokes; and 3) these documents are written by different people with different stroke styles. To resolve these problems, we propose a character segmentation method based on key feature information for historical Tibetan documents. The proposed method consists of three parts: 1) projection and syllable point location information are used to shorten the text lines of historical Tibetan documents and establish a character block database; 2) the local baseline of the character block is detected by using the location information of syllable points or combined with horizontal projection and straight line detection, and the character block is divided into two areas above and below the baseline, and different segmentation methods are adopted; and 3) in view of the large difference in stroke styles, three stroke attribution distances are used to complete the attribution. The experimental results show that the method proposed in this paper can effectively solve the problem of character segmentation of historical Tibetan documents and achieve a better character segmentation effect, which also provides a reference for the relevant document character segmentation.
引用
收藏
页码:25376 / 25391
页数:16
相关论文
共 50 条
  • [1] Character Segmentation for Historical Uchen Tibetan Document Based on Structure Attributes
    Zhang Ce
    Wang Weilan
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2021, 58 (20)
  • [2] Construction of a Character Dataset for Historical Uchen Tibetan Documents under Low-Resource Conditions
    Zhang, Ce
    Wang, Weilan
    Zhang, Guowei
    [J]. ELECTRONICS, 2022, 11 (23)
  • [3] Touching text line segmentation combined local baseline and connected component for Uchen Tibetan historical documents
    Hu, Pengfei
    Wang, Weilan
    Li, Qiaoqiao
    Wang, Tiejun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2021, 58 (06)
  • [4] A Touching Character Database from Tibetan Historical Documents to Evaluate the Segmentation Algorithm
    Zhao, Quanchao
    Ma, Long-long
    Duan, Lijuan
    [J]. PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT IV, 2018, 11259 : 309 - 321
  • [5] Character recognition of Tibetan Historical document in Uchen font: Dataset and bench mark
    Li, Zhenjiang
    Wang, Weilan
    Wang, Yiqun
    Zhang, Qianxue
    [J]. JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2022, 22 (05) : 1779 - 1794
  • [6] A novel method of text line segmentation for historical document image of the uchen Tibetan
    Li, Zhenjiang
    Wang, Weilan
    Chen, Yang
    Hao, Yusheng
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 61 : 23 - 32
  • [7] A Recognition Method of the Similarity Character for Uchen Script Tibetan Historical Document Based on DNN
    Wang, Xiaojuan
    Wang, Weilan
    Li, Zhenjiang
    Wang, Yiqun
    Han, Yuehui
    Hao, Zhanjun
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT III, 2018, 11258 : 52 - 62
  • [8] A Text-Line Segmentation Method for Historical Tibetan Documents Based on Baseline Detection
    Li, Yanxing
    Ma, Longlong
    Duan, Lijuan
    Wu, Jian
    [J]. COMPUTER VISION, PT I, 2017, 771 : 356 - 367
  • [9] Touching Character Segmentation Method for Chinese Historical Documents
    Sun, Xiaolu
    Peng, Liangrui
    Ding, Xiaoqing
    [J]. DOCUMENT RECOGNITION AND RETRIEVAL XVII, 2010, 7534
  • [10] Character Segmentation for Classical Mongolian Words in Historical Documents
    Su, Xiangdong
    Gao, Guanglai
    Wang, Weihua
    Bao, Feilong
    Wei, Hongxi
    [J]. PATTERN RECOGNITION (CCPR 2014), PT II, 2014, 484 : 464 - 473