Unsupervised Font Reconstruction Based on Token Co-occurrence

被引:0
|
作者
Cutter, Michael P. [1 ]
van Beusekom, Joost [1 ]
Shafait, Faisal [1 ]
Breuel, Thomas M. [1 ]
机构
[1] Tech Univ Kaiserslautern, Kaiserslautern, Germany
来源
DOCENG2010: PROCEEDINGS OF THE 2010 ACM SYMPOSIUM ON DOCUMENT ENGINEERING | 2010年
关键词
Token Compression; Font Reconstruction; Candidate Fonts; Token Co-occurrence Graph Partitioning; RECOGNITION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High quality conversions of scanned documents into PDF usually either rely on full OCR or token compression. This paper describes an approach intermediate between those two: it is based on token clustering, but additionally groups tokens into candidate fonts. Our approach has the potential of yielding OCR-like PDFs when the inputs are high quality and degrading to token based compression when the font analysis fails, while preserving full visual fidelity. Our approach is based on an unsupervised algorithm for grouping tokens into candidate fonts. The algorithm constructs a graph based on token proximity and derives token groups by partitioning this graph. In initial experiments on scanned 300 dpi pages containing multiple fonts, this technique reconstructs candidate fonts with 100% accuracy.
引用
收藏
页码:143 / 149
页数:7
相关论文
共 50 条
  • [31] Text Classification Method Based on Co-occurrence Events
    Huang, Chan
    Luo, Yanmei
    Li, Qingyuan
    2019 15TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS 2019), 2019, : 277 - 281
  • [32] Contrast enhancement based on discriminative co-occurrence statistics
    Wu, X.
    Sun, Y.
    Kawanishi, T.
    Kashino, K.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (04) : 6413 - 6442
  • [33] Co-occurrence and ranking of entities based on semantic annotation
    Popov, Borislav
    Kiryakov, Atanas
    Kitchukov, Ilian
    Angelov, Krasimir
    Kozhuharov, Danail
    International Journal of Metadata, Semantics and Ontologies, 2008, 3 (01) : 21 - 36
  • [34] Audio Steganalysis Based on Co-occurrence Matrix and PCA
    Qi Yinchen
    Wang Yan
    Yuan Jinsha
    2009 INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION, VOL I, 2009, : 433 - 436
  • [35] Contrast enhancement based on discriminative co-occurrence statistics
    X. Wu
    Y. Sun
    T. Kawanishi
    K. Kashino
    Multimedia Tools and Applications, 2021, 80 : 6413 - 6442
  • [36] Grounding co-occurrence: Identifying features in a lexical co-occurrence model of semantic memory
    Durda, Kevin
    Buchanan, Lori
    Caron, Richard
    BEHAVIOR RESEARCH METHODS, 2009, 41 (04) : 1210 - 1223
  • [37] Image denoising based on the wavelet co-occurrence matrix
    Shan, ZY
    Aviyente, S
    2005 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1-5: SPEECH PROCESSING, 2005, : 645 - 648
  • [38] Co-occurrence of chancroid and gonorrhea
    Nawaf, Al-Mutairi
    Joshi, Arun
    Tayeh, Mohammad
    JOURNAL OF CUTANEOUS MEDICINE AND SURGERY, 2006, 10 (01) : 41 - 44
  • [39] Conflating "co-occurrence" with "coexistence"
    Harihar, Abishek
    Chanchani, Pranav
    Sharma, Rishi Kumar
    Vattakaven, Joseph
    Gubbi, Sanjay
    Pandav, Bivash
    Noon, Barry
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2013, 110 (02) : E109 - E109
  • [40] Co-occurrence and similarity revisited
    Fernando Chirigati
    Nature Computational Science, 2022, 2 : 67 - 67