Unsupervised Font Reconstruction Based on Token Co-occurrence

被引:0
|
作者
Cutter, Michael P. [1 ]
van Beusekom, Joost [1 ]
Shafait, Faisal [1 ]
Breuel, Thomas M. [1 ]
机构
[1] Tech Univ Kaiserslautern, Kaiserslautern, Germany
关键词
Token Compression; Font Reconstruction; Candidate Fonts; Token Co-occurrence Graph Partitioning; RECOGNITION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High quality conversions of scanned documents into PDF usually either rely on full OCR or token compression. This paper describes an approach intermediate between those two: it is based on token clustering, but additionally groups tokens into candidate fonts. Our approach has the potential of yielding OCR-like PDFs when the inputs are high quality and degrading to token based compression when the font analysis fails, while preserving full visual fidelity. Our approach is based on an unsupervised algorithm for grouping tokens into candidate fonts. The algorithm constructs a graph based on token proximity and derives token groups by partitioning this graph. In initial experiments on scanned 300 dpi pages containing multiple fonts, this technique reconstructs candidate fonts with 100% accuracy.
引用
收藏
页码:143 / 149
页数:7
相关论文
共 50 条
  • [41] Co-occurrence of pain syndromes
    Giannapia Affaitati
    Raffaele Costantini
    Claudio Tana
    Francesco Cipollone
    Maria Adele Giamberardino
    Journal of Neural Transmission, 2020, 127 : 625 - 646
  • [42] CO-OCCURRENCE OF ACETYLENES AND CYCLOPROPENES
    SMITH, GN
    BULOCK, JD
    CHEMISTRY & INDUSTRY, 1965, (44) : 1840 - &
  • [43] Parasomnias: co-occurrence and genetics
    Hublin, C
    Kaprio, J
    Partinen, M
    Koskenvuo, M
    PSYCHIATRIC GENETICS, 2001, 11 (02) : 65 - 70
  • [44] The Complexity of the Co-occurrence Problem
    Bille, Philip
    Li Gortz, Inge
    Stordalen, Tord
    STRING PROCESSING AND INFORMATION RETRIEVAL, SPIRE 2022, 2022, 13617 : 38 - 52
  • [45] The co-occurrence of smoking and suicide
    Vermeulen, Jentien M.
    Bolhuis, Koen
    BRITISH JOURNAL OF PSYCHIATRY, 2020, 217 (06) : 708 - 709
  • [46] Efficient unsupervised mining from noisy data sets: application to clustering co-occurrence data
    Mamitsuka, H
    PROCEEDINGS OF THE THIRD SIAM INTERNATIONAL CONFERENCE ON DATA MINING, 2003, : 239 - 243
  • [47] SSCAP: Self-supervised Co-occurrence Action Parsing for Unsupervised Temporal Action Segmentation
    Wang, Zhe
    Chen, Hao
    Li, Xinyu
    Liu, Chunhui
    Xiong, Yuanjun
    Tighe, Joseph
    Fowlkes, Charless
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 175 - 184
  • [48] An unsupervised language independent method of name discrimination using second order co-occurrence features
    Pedersen, T
    Kulkarni, A
    Angheluta, R
    Kozareva, Z
    Solorio, T
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2006, 3878 : 208 - 222
  • [49] Elite Co-Occurrence in the Media
    Traag, Vincent A.
    Reinanda, Ridho
    van Klinken, Gerry
    ASIAN JOURNAL OF SOCIAL SCIENCE, 2015, 43 (05) : 588 - 612
  • [50] Co-occurrence of schizophrenia and dementia
    Gerhard, Tobias
    Stroup, T. Scott
    Wall, Melanie
    Huang, Cecilia
    Olfson, Mark
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 73 - 73