Gradual OCR: An Effective OCR Approach Based on Gradual Detection of Texts

被引:0
|
作者
Park, Youngki [1 ]
Shin, Youhyun [2 ]
机构
[1] Chuncheon Natl Univ Educ, Dept Comp Educ, Chunchon 24328, South Korea
[2] Incheon Natl Univ, Dept Comp Sci & Engn, Incheon 22012, South Korea
基金
新加坡国家研究基金会;
关键词
optical character recognition; gradual OCR; gradual text detection; gradual low-quality filtering; RECOGNITION;
D O I
10.3390/math11224585
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this paper, we present a novel approach to optical character recognition that incorporates various supplementary techniques, including the gradual detection of texts and gradual filtering of inaccurately recognized texts. To minimize false negatives, we attempt to detect all text by incrementally lowering the relevant thresholds. To mitigate false positives, we implement a novel filtering method that dynamically adjusts based on the confidence levels of recognized texts and their corresponding detection thresholds. Additionally, we use straightforward yet effective strategies to enhance the optical character recognition accuracy and speed, such as upscaling, link refinement, perspective transformation, the merging of cropped images, and simple autoregression. Given our focus on Korean chart data, we compile a mix of real-world and artificial Korean chart datasets for experimentation. Our experimental results show that our approach outperforms Tesseract by approximately 7 to 15 times and EasyOCR by 3 to 5 times in accuracy, as measured using a Jaccard similarity-based error rate on our datasets.
引用
收藏
页数:20
相关论文
共 50 条
  • [1] OCR OF ARABIC TEXTS
    AMIN, A
    [J]. LECTURE NOTES IN COMPUTER SCIENCE, 1988, 301 : 616 - 625
  • [2] An HMM-based OCR for Persian/Arabic texts
    Ahmadi, A
    Omatu, S
    Yoshioka, M
    [J]. KNOWLEDGE-BASED INTELLIGENT INFORMATION ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, PTS 1 AND 2, 2001, 69 : 824 - 828
  • [3] A heuristic approach to caption enhancement for effective video OCR
    Xie, Lei
    Tan, Xi
    [J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, PROCEEDINGS: WITH ASPECTS OF THEORETICAL AND METHODOLOGICAL ISSUES, 2008, 5226 : 347 - +
  • [4] An Impact of OCR Errors on Automated Classification of OCR Japanese Texts With Parts-of-Speech Analysis
    Kokawa, Akihiro
    Busagala, Lazaro S. P.
    Ohyama, Wataru
    Wakabayashi, Tetsushi
    Kimura, Fumitaka
    [J]. 11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 543 - 547
  • [5] Deep Statistical Analysis of OCR Errors for Effective Post-OCR Processing
    Thi-Tuyet-Hai Nguyen
    Jatowt, Adam
    Coustaty, Mickael
    Nhu-Van Nguyen
    Doucet, Antoine
    [J]. 2019 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2019), 2019, : 29 - 38
  • [6] Language modelling for the needs of OCR of medical texts
    Piasecki, Maciej
    Godlewski, Grzegorz
    [J]. BIOLOGICAL AND MEDICAL DATA ANALYSIS, PROCEEDINGS, 2006, 4345 : 273 - +
  • [7] OCR Post Correction for Endangered Language Texts
    Rijhwani, Shruti
    Anastasopoulos, Antonios
    Neubig, Graham
    [J]. PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 5931 - 5942
  • [8] Gradual Transition Detection Based on Bipartite Graph Matching Approach
    Guimaraes, Silvio J. F.
    do Patrocinio, Zenilton K. G., Jr.
    Souza, Kleber J. F.
    de Paula, Hugo B.
    [J]. 2009 IEEE INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP 2009), 2009, : 289 - 294
  • [9] An Efficient Unsupervised Approach for OCR Error Correction of Vietnamese OCR Text
    Nguyen, Quoc-Dung
    Phan, Nguyet-Minh
    Kromer, Pavel
    Le, Duc-Anh
    [J]. IEEE ACCESS, 2023, 11 : 58406 - 58421
  • [10] A unified approach to gradual shot transition detection
    Bescós, J
    Menéndez, JM
    Cisneros, G
    Cabrera, J
    Martínez, JM
    [J]. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL III, PROCEEDINGS, 2000, : 949 - 952