Unsupervised Font Reconstruction Based on Token Co-occurrence

被引:0
|
作者
Cutter, Michael P. [1 ]
van Beusekom, Joost [1 ]
Shafait, Faisal [1 ]
Breuel, Thomas M. [1 ]
机构
[1] Tech Univ Kaiserslautern, Kaiserslautern, Germany
关键词
Token Compression; Font Reconstruction; Candidate Fonts; Token Co-occurrence Graph Partitioning; RECOGNITION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High quality conversions of scanned documents into PDF usually either rely on full OCR or token compression. This paper describes an approach intermediate between those two: it is based on token clustering, but additionally groups tokens into candidate fonts. Our approach has the potential of yielding OCR-like PDFs when the inputs are high quality and degrading to token based compression when the font analysis fails, while preserving full visual fidelity. Our approach is based on an unsupervised algorithm for grouping tokens into candidate fonts. The algorithm constructs a graph based on token proximity and derives token groups by partitioning this graph. In initial experiments on scanned 300 dpi pages containing multiple fonts, this technique reconstructs candidate fonts with 100% accuracy.
引用
收藏
页码:143 / 149
页数:7
相关论文
共 50 条
  • [1] WIKHDRANK: AN UNSUPERVISED APPROACH FOR ENTITY LINKING BASED ON INSTANCE CO-OCCURRENCE
    Fernandez, Norberto
    Fisteus, Jesus A.
    Sanchez, Luis
    Fuentes-Lorenzo, Damaris
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (11): : 7519 - 7541
  • [2] Unsupervised Discovery of Co-occurrence in Sparse High Dimensional Data
    Chum, Ondrej
    Matas, Jiri
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 3416 - 3423
  • [3] Unsupervised Heterogeneous Transfer Learning for Partial Co-occurrence Data
    Liu, Shuyu
    Yang, Liu
    Hu, Qinghua
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2021, 30 (03)
  • [4] EFFICIENT UNSUPERVISED MINING FROM NOISY CO-OCCURRENCE DATA
    Mamitsuka, Hiroshi
    NEW MATHEMATICS AND NATURAL COMPUTATION, 2005, 1 (01) : 173 - 193
  • [5] Unsupervised Multimodal Word Discovery Based on Double Articulation Analysis With Co-Occurrence Cues
    Taniguchi, Akira
    Murakami, Hiroaki
    Ozaki, Ryo
    Taniguchi, Tadahiro
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (04) : 1825 - 1840
  • [6] Co-occurrence based texture synthesis
    Darzi, Anna
    Lang, Itai
    Taklikar, Ashutosh
    Averbuch-Elor, Hadar
    Avidan, Shai
    COMPUTATIONAL VISUAL MEDIA, 2022, 8 (02) : 289 - 302
  • [7] Co-occurrence based texture synthesis
    Anna Darzi
    Itai Lang
    Ashutosh Taklikar
    Hadar Averbuch-Elor
    Shai Avidan
    Computational Visual Media, 2022, 8 (02) : 289 - 302
  • [8] Co-occurrence based texture synthesis
    Anna Darzi
    Itai Lang
    Ashutosh Taklikar
    Hadar Averbuch-Elor
    Shai Avidan
    Computational Visual Media, 2022, 8 : 289 - 302
  • [9] Supervised and Unsupervised Aspect Category Detection for Sentiment Analysis with Co-occurrence Data
    Schouten, Kim
    van der Weijde, Onne
    Frasincar, Flavius
    Dekker, Rommert
    IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (04) : 1263 - 1275
  • [10] Co-occurrence graph-based context adaptation: a new unsupervised approach to word sense disambiguation
    Rahmani, Saeed
    Fakhrahmad, Seyed Mostafa
    Sadreddini, Mohammad Hadi
    DIGITAL SCHOLARSHIP IN THE HUMANITIES, 2021, 36 (02) : 449 - 471