Font Group Identification Using Reconstructed Fonts

被引:0
|
作者
Cutter, Michael P. [1 ]
van Beusekom, Joost [1 ]
Shafait, Faisal [1 ]
Breuel, Thomas M. [1 ]
机构
[1] Univ Kaiserslautern, D-67663 Kaiserslautern, Germany
来源
关键词
Font Reconstruction; Font Identification; Reconstructed Font; Token Matching;
D O I
10.1117/12.873398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ideally, digital versions of scanned documents should be represented in a format that is searchable, compressed, highly readable, and faithful to the original. These goals can theoretically be achieved through OCR and font recognition, re-typesetting the document text with original fonts. However, OCR and font recognition remain hard problems, and many historical documents use fonts that are not available in digital forms. It is desirable to be able to reconstruct fonts with vector glyphs that approximate the shapes of the letters that form a font. In this work, we address the grouping of tokens in a token-compressed document into candidate fonts. This permits us to incorporate font information into token-compressed images even when the original fonts are unknown or unavailable in digital format. This paper extends previous work in font reconstruction by proposing and evaluating an algorithm to assign a font to every character within a document. This is a necessary step to represent a scanned document image with a reconstructed font. Through our evaluation method, we have measured a 98.4% accuracy for the assignment of letters to candidate fonts in multi-font documents.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] Pattern Identification Using Reconstructed Phase Space and Hidden Markov Model
    Zhang, Wenjing
    Feng, Xin
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 374 - 379
  • [42] Outline representation of fonts using genetic approach
    Sarfraz, M
    ADVANCES IN SOFT COMPUTING: ENGINEERING DESIGN AND MANUFACTURING, 2003, : 109 - 118
  • [43] Morphological representation of characters and its application in font identification
    ISIR, Osaka University, 8-1 Mihogaoka, Ibaraki, Osaka 567-0047, Japan
    Can J Electr Comput Eng, 2 (51-56):
  • [44] A morphological representation of characters and its application in font identification
    Ming, W
    Babaguchi, N
    Kitahashi, T
    CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING-REVUE CANADIENNE DE GENIE ELECTRIQUE ET INFORMATIQUE, 1999, 24 (02): : 51 - 56
  • [45] Identification of font styles and typefaces in printed Korean documents
    Jeong, CB
    Kwag, HK
    Kim, SH
    Kim, JS
    Park, SC
    DIGITAL LIBRARIES: TECHNOLOGY AND MANAGEMENT OF INDIGENOUS KNOWLEDGE FOR GLOBAL ACCESS, 2003, 2911 : 666 - 669
  • [46] Tracing the origins of incunabula through the automatic identification of fonts in digitised documents
    Lacasta, Javier
    Nogueras-Iso, Javier
    Javier Zarazaga-Soria, F.
    Pedraza-Gracia, Manuel-Jose
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 40977 - 40991
  • [47] LexiaD, the first dyslexia-specific Cyrillic font compared to the popular Times New Roman and Roboto fonts when read by adolescents
    Alexeeva, Svetlana V.
    Zubov, Vladislav I.
    PERCEPTION, 2021, 50 (1_SUPPL) : 114 - 114
  • [48] www.lineto.com; Tomorrow's fonts today (Digital type foundry and font resource, Lineto is an online workshop for typographers)
    Wolff, L
    GRAPHIS, 2002, (337): : 132 - 139
  • [49] Tracing the origins of incunabula through the automatic identification of fonts in digitised documents
    Javier Lacasta
    Javier Nogueras-Iso
    F. Javier Zarazaga-Soria
    Manuel-José Pedraza-Gracia
    Multimedia Tools and Applications, 2022, 81 : 40977 - 40991
  • [50] Capturing outline of fonts using genetic algorithm and splines
    Sarfraz, M
    Raza, SA
    FIFTH INTERNATIONAL CONFERENCE ON INFORMATION VISUALISATION, PROCEEDINGS, 2001, : 738 - 743