Cluster-based Sample Selection for Document Image Binarization

被引:2
|
作者
Krantz, Amandus [1 ]
Westphal, Florian [1 ]
机构
[1] Blekinge Inst Technol, Dept Comp Sci, Karlskrona, Sweden
关键词
document image binarization; sample selection; neural networks; computer vision; RELATIVE NEIGHBORHOOD GRAPH; COMPETITION;
D O I
10.1109/ICDARW.2019.40080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The current state-of-the-art, in terms of performance, for solving document image binarization is training artificial neural networks on pre-labelled ground truth data. As such, it faces the same issues as other, more conventional, classification problems; requiring a large amount of training data. However, unlike those conventional classification problems, document image binarization involves having to either manually craft or estimate the binarized ground truth data, which can be error-prone and time-consuming. This is where sample selection, the act of selecting training samples based on some method or metric, might help. By reducing the size of the training dataset in such a way that the binarization performance is not impacted, the required time spent creating the ground truth is also reduced. This paper proposes a cluster-based sample selection method that uses image similarity metrics and the relative neighbourhood graph to reduce the underlying redundancy of the dataset. The method, implemented with affinity propagation and the structural similarity index, reduces the training dataset on average by 49.57% while reducing the binarization performance only by 0.55%.
引用
收藏
页码:47 / 52
页数:6
相关论文
共 50 条
  • [1] Document Image Binarization Based on NFCM
    Tong Li-Jing
    Chen Kan
    Zhang Yan
    Fu Xiao-Ling
    Duan Jian-Yong
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 1769 - 1773
  • [2] Better threshold selection approach for document image binarization
    Wang, Qing
    Zhao, Rongchun
    Chi, Zheru
    Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University, 2002, 20 (03): : 396 - 399
  • [3] Dynamic filters selection for textual document image binarization
    Cecotti, Hubert
    Belaid, Abdel
    19TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOLS 1-6, 2008, : 2660 - 2663
  • [4] Cluster-based selection
    Dunbar, JB
    PERSPECTIVES IN DRUG DISCOVERY AND DESIGN, 1997, 7-8 : 51 - 63
  • [5] Unsupervised Cluster-based Band Selection for Hyperspectral Image Classification
    Wu, Jee-Cheng
    Tsuei, Gwo-Chyang
    PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND ELECTRONICS INFORMATION (ICACSEI 2013), 2013, 41 : 562 - 565
  • [6] Document image binarization based on texture features
    Liu, Y
    Srihari, SN
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1997, 19 (05) : 540 - 544
  • [7] Document image binarization based on texture features
    State Univ of New York at Buffalo, Buffalo, United States
    IEEE Trans Pattern Anal Mach Intell, 5 (540-544):
  • [8] Document image binarization based on stroke enhancement
    Zhu, Yuanping
    Wang, Chunheng
    Dai, Ruwei
    18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2006, : 955 - +
  • [9] Selective Cluster-Based Document Retrieval
    Levi, Or
    Raiber, Fiana
    Kurland, Oren
    Guy, Ido
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1473 - 1482
  • [10] Document Image Binarization Process
    Prodan, Marcel
    Boiangiu, Costin-Anton
    BRAIN-BROAD RESEARCH IN ARTIFICIAL INTELLIGENCE AND NEUROSCIENCE, 2023, 14 (02): : 93 - 114