Survey and empirical comparison of different approaches for text extraction from scholarly figures

被引:0
|
作者
Falk Böschen
Tilman Beck
Ansgar Scherp
机构
[1] Kiel University,
[2] ZBW - Leibniz Information Centre for Economics,undefined
来源
关键词
Scholarly figures; Text extraction; Comparison; Figure search;
D O I
暂无
中图分类号
学科分类号
摘要
Different approaches have been proposed in the past to address the challenge of extracting text from scholarly figures. However, until recently, no comparative evaluation of the different approaches had been conducted. Thus, we performed an extensive study of the related work and evaluated in total 32 different approaches. In this work, we perform a more detailed comparison of the 7 most relevant approaches described in the literature and extend to 37 systematic linear combinations of methods for extracting text from scholarly figures. Our generic pipeline, consisting of six steps, allows us to freely combine the different possible methods and perform a fair comparison. Overall, we have evaluated 44 different linear pipeline configurations and systematically compared the different methods. We then derived two non-linear configurations and a two-pass approach. We evaluate all pipeline configurations over four datasets of scholarly figures of different origin and characteristics. The quality of the extraction results is assessed using F-measure and Levenshtein distance, and we measure the runtime performance. Our experiments showed that there is a linear configuration that overall shows the best text extraction quality on all datasets. Further experiments showed that the best configuration can be improved by extending it to a two-pass approach. Regarding the runtime, we observed huge differences from very fast approaches to those running for several weeks. Our experiments found the best working configuration for text extraction from our method set. However, they also showed that further improvements regarding region extraction and classification are needed.
引用
收藏
页码:29475 / 29505
页数:30
相关论文
共 50 条
  • [21] A Survey on Text Information Extraction from Born-Digital and Scene Text Images
    Joan, S. P. Faustina
    Valli, S.
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES INDIA SECTION A-PHYSICAL SCIENCES, 2019, 89 (01) : 77 - 101
  • [22] THE CONCEPT OF CANOPY RESISTANCE - HISTORICAL SURVEY AND COMPARISON OF DIFFERENT APPROACHES
    LHOMME, JP
    AGRICULTURAL AND FOREST METEOROLOGY, 1991, 54 (2-4) : 227 - 240
  • [23] A systematic empirical comparison of different approaches for normalizing citation impact indicators
    Waltman, Ludo
    van Eck, Nees Jan
    JOURNAL OF INFORMETRICS, 2013, 7 (04) : 833 - 849
  • [24] Prediction of cancer incidence in the Nordic countries:: empirical comparison of different approaches
    Moller, B
    Fekjær, H
    Hakulinen, T
    Sigvaldason, H
    Storm, HH
    Talbäck, M
    Haldorsen, T
    STATISTICS IN MEDICINE, 2003, 22 (17) : 2751 - 2766
  • [25] A SYSTEMATIC EMPIRICAL COMPARISON OF DIFFERENT APPROACHES FOR NORMALIZING CITATION IMPACT INDICATORS
    Waltman, Ludo
    van Eck, Nees Jan
    14TH INTERNATIONAL SOCIETY OF SCIENTOMETRICS AND INFORMETRICS CONFERENCE (ISSI), 2013, : 1649 - 1664
  • [26] Text Extraction from Images using Gamma Correction Method and different Text Extraction Methods - A Comparative Analysis
    Devi, G. Gayathri
    Sumathi, C. P.
    2014 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2014,
  • [27] From outputs to insights: a survey of rationalization approaches for explainable text classification
    Guzman, Erick Mendez
    Schlegel, Viktor
    Batista-Navarro, Riza
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2024, 7
  • [28] A comparison of different Gabor feature extraction approaches for mass classification in mammography
    Salabat Khan
    Muhammad Hussain
    Hatim Aboalsamh
    George Bebis
    Multimedia Tools and Applications, 2017, 76 : 33 - 57
  • [29] Removal of leads broken during extraction: A comparison of different approaches and tools
    Kutarski, Andrzej
    Jachec, Wojciech
    Pietura, Radoslaw
    Stefanczyk, Pawel
    Kosior, Jaroslaw
    Czakowski, Marek
    Sawonik, Sebastian
    Tulecki, Lukasz
    Nowosielecka, Dorota
    JOURNAL OF CARDIOVASCULAR ELECTROPHYSIOLOGY, 2024, 35 (10) : 1981 - 1996
  • [30] A comparison of different Gabor feature extraction approaches for mass classification in mammography
    Khan, Salabat
    Hussain, Muhammad
    Aboalsamh, Hatim
    Bebis, George
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (01) : 33 - 57