Survey and empirical comparison of different approaches for text extraction from scholarly figures

被引:0
|
作者
Falk Böschen
Tilman Beck
Ansgar Scherp
机构
[1] Kiel University,
[2] ZBW - Leibniz Information Centre for Economics,undefined
来源
关键词
Scholarly figures; Text extraction; Comparison; Figure search;
D O I
暂无
中图分类号
学科分类号
摘要
Different approaches have been proposed in the past to address the challenge of extracting text from scholarly figures. However, until recently, no comparative evaluation of the different approaches had been conducted. Thus, we performed an extensive study of the related work and evaluated in total 32 different approaches. In this work, we perform a more detailed comparison of the 7 most relevant approaches described in the literature and extend to 37 systematic linear combinations of methods for extracting text from scholarly figures. Our generic pipeline, consisting of six steps, allows us to freely combine the different possible methods and perform a fair comparison. Overall, we have evaluated 44 different linear pipeline configurations and systematically compared the different methods. We then derived two non-linear configurations and a two-pass approach. We evaluate all pipeline configurations over four datasets of scholarly figures of different origin and characteristics. The quality of the extraction results is assessed using F-measure and Levenshtein distance, and we measure the runtime performance. Our experiments showed that there is a linear configuration that overall shows the best text extraction quality on all datasets. Further experiments showed that the best configuration can be improved by extending it to a two-pass approach. Regarding the runtime, we observed huge differences from very fast approaches to those running for several weeks. Our experiments found the best working configuration for text extraction from our method set. However, they also showed that further improvements regarding region extraction and classification are needed.
引用
收藏
页码:29475 / 29505
页数:30
相关论文
共 50 条
  • [41] ICDAR2017 Robust Reading Challenge on Text Extraction from Biomedical Literature Figures (DeTEXT)
    Yang, Chun
    Yin, Xu-Cheng
    Yu, Hong
    Karatzas, Dimosthenis
    Cao, Yu
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR), VOL 1, 2017, : 1444 - 1447
  • [42] Use of text mining tools in the development of search strategies- Comparison of different approaches
    Hausner, Elke
    Knelangen, Marco
    Waffenschmidt, Siw
    JOURNAL OF CLINICAL EPIDEMIOLOGY, 2022, 149 : 254 - 256
  • [43] Comparison of Approaches to the Extraction of Mathematical Methods from Scientific Texts
    Z. S. Ismagulov
    D. V. Kosyakov
    A. E. Guskov
    Automatic Documentation and Mathematical Linguistics, 2024, 58 (6) : 441 - 452
  • [44] Survey on Narrative Structure: from Linguistic Theories to Automatic Extraction Approaches
    Berhe, Aman
    Guinaudeau, Camille
    Barras, Claude
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2022, 63 (01): : 63 - 87
  • [45] A comparison of statistical models for the extraction of lexical information from text corpora
    Dennis, S
    PROCEEDINGS OF THE TWENTY-FIFTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, PTS 1 AND 2, 2003, : 330 - 335
  • [46] A study of knowledge extraction from free text data in customer satisfaction survey
    Yukari, I
    Satoru, T
    Kazuhiko, T
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 1, PROCEEDINGS, 2004, 3213 : 509 - 515
  • [47] An empirical comparison of different approaches for combining multimodal neuroimaging data with support vector machine
    Pettersson-Yeo, William
    Benetti, Stefania
    Marquand, Andre F.
    Joules, Richard
    Catani, Marco
    Williams, Steve C. R.
    Allen, Paul
    McGuire, Philip
    Mechelli, Andrea
    FRONTIERS IN NEUROSCIENCE, 2014, 8
  • [48] EMPIRICAL COMPARISON OF DIFFERENT VIEW STATE APPROACHES ON PERFORMANCE OF ASP NET WEB APPLICATIONS
    Djambic, Goran
    Kucak, Danijel
    Fulanovic, Bojan
    ANNALS OF DAAAM FOR 2012 & PROCEEDINGS OF THE 23RD INTERNATIONAL DAAAM SYMPOSIUM - INTELLIGENT MANUFACTURING AND AUTOMATION - FOCUS ON SUSTAINABILITY, 2012, 23 : 733 - 736
  • [49] Dealing With Dependent Effect Sizes in MASEM A Comparison of Different Approaches Using Empirical Data
    Stolwijk, Isidora
    Jak, Suzanne
    Eichelsheim, Veroni
    Hoeve, Machteld
    ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY, 2022, 230 (01): : 16 - 32
  • [50] Measuring the level and risk of corporate responsibility - An empirical comparison of different ESG rating approaches
    Dorfleitner G.
    Halbritter G.
    Nguyen M.
    Journal of Asset Management, 2015, 16 (7) : 450 - 466