Font Group Identification Using Reconstructed Fonts

被引:0
|
作者
Cutter, Michael P. [1 ]
van Beusekom, Joost [1 ]
Shafait, Faisal [1 ]
Breuel, Thomas M. [1 ]
机构
[1] Univ Kaiserslautern, D-67663 Kaiserslautern, Germany
来源
关键词
Font Reconstruction; Font Identification; Reconstructed Font; Token Matching;
D O I
10.1117/12.873398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ideally, digital versions of scanned documents should be represented in a format that is searchable, compressed, highly readable, and faithful to the original. These goals can theoretically be achieved through OCR and font recognition, re-typesetting the document text with original fonts. However, OCR and font recognition remain hard problems, and many historical documents use fonts that are not available in digital forms. It is desirable to be able to reconstruct fonts with vector glyphs that approximate the shapes of the letters that form a font. In this work, we address the grouping of tokens in a token-compressed document into candidate fonts. This permits us to incorporate font information into token-compressed images even when the original fonts are unknown or unavailable in digital format. This paper extends previous work in font reconstruction by proposing and evaluating an algorithm to assign a font to every character within a document. This is a necessary step to represent a scanned document image with a reconstructed font. Through our evaluation method, we have measured a 98.4% accuracy for the assignment of letters to candidate fonts in multi-font documents.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Font and function word identification in document recognition
    Khoubyari, S
    Hull, JJ
    COMPUTER VISION AND IMAGE UNDERSTANDING, 1996, 63 (01) : 66 - 74
  • [32] Font Identification using Gabor Features at Sub image Level and Bin Based Technique
    Urolagin, Siddhaling
    Anigol, Anusha
    PROCEEDINGS OF 2015 IEEE 9TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO), 2015,
  • [33] Font classification using NMF
    Lee, CW
    Kang, HY
    Jung, KC
    Kim, HJ
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, PROCEEDINGS, 2003, 2756 : 470 - 477
  • [34] Farsi Font Face and Font Size Recognition using Neural Network
    Pourasad, Yaghoub
    Hassibi, Houshang
    Farokhi, Saeed
    Ghorbani, Azam
    INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (11B): : 5087 - 5098
  • [35] Automatic Thai and English fonts identification without character recognition
    Kruatrachue, B
    Piyatrakul, P
    2001 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS AND SIGNAL PROCESSING, VOLS I AND II, CONFERENCE PROCEEDINGS, 2001, : 603 - 606
  • [36] Identification of reconstructed milk in raw milk using near infrared spectroscopy
    Han Dong-hai
    Lu Chao
    Liu Yi
    Pi Fu-wei
    SPECTROSCOPY AND SPECTRAL ANALYSIS, 2007, 27 (03) : 465 - 468
  • [37] Fonts Style Transfer using Conditional GAN
    Sakao, Naho
    Dobashi, Yoshinori
    2019 INTERNATIONAL CONFERENCE ON CYBERWORLDS (CW), 2019, : 391 - 394
  • [38] showtext: Using System Fonts in R Graphics
    Qiu, Yixuan
    R JOURNAL, 2015, 7 (01): : 99 - 108
  • [39] Identification of reconstructed milk in raw milk using near infrared spectroscopy
    College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China
    Guang Pu Xue Yu Guang Pu Fen Xi, 2007, 3 (465-468):
  • [40] Font-ProtoNet: Prototypical Network based Font Identification of Document Images in Low Data Regime
    Goel, Nikita
    Sharma, Monika
    Vig, Lovekesh
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 2369 - 2376