Scientific Data Extraction from Oceanographic Papers

被引:1
|
作者
Veyhe, Bartal Eyofnsson [1 ]
Sagi, Tomer [1 ]
Hose, Katja [1 ,2 ]
机构
[1] Aalborg Univ, Aalborg, Denmark
[2] TU Wien, Vienna, Austria
关键词
Table extraction; Scientific data; Entity Linking;
D O I
10.1145/3543873.3587595
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Scientific data collected in the oceanographic domain is invaluable to researchers when performing meta-analyses and examining changes over time in oceanic environments. However, many of the data samples and subsequent analyses published by researchers are not uploaded to a repository leaving the scientific paper as the only available source. Automated extraction of scientific data is, therefore, a valuable tool for such researchers. Specifically, much of the most valuable data in scientific papers are structured as tables, making these a prime target for information extraction research. Using the data relies on an additional step where the concepts mentioned in the tables, such as names of measures, units, and biological species, are identified within a domain ontology. Unfortunately, state-of-the-art table extraction leaves much to be desired and has not been attempted on a large scale on oceanographic papers. Furthermore, while entity linking in the context of a full paragraph of text has been heavily researched, it is still lacking in this harder task of linking single concepts. In this work, we present an annotated benchmark dataset of data tables from oceanographic papers. We further present the result of an evaluation on the extraction of these tables and the linking of the contained entities to the domain and general-purpose knowledge bases using the current state of the art. We highlight the challenges and quantify the performance of current tools for table extraction and table-concept linking.
引用
收藏
页码:800 / 804
页数:5
相关论文
共 50 条
  • [1] Transformer-based highlights extraction from scientific papers
    La Quatra, Moreno
    Cagliero, Luca
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 252
  • [2] Natural Language Processing Applied on Large Scale Data Extraction from Scientific Papers in Fuel Cells
    Yang Feifan
    [J]. 2021 5TH INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, NLPIR 2021, 2021, : 168 - 175
  • [3] Extraction and Characterization of Citations in Scientific Papers
    Bertin, Marc
    Atanassova, Iana
    [J]. SEMANTIC WEB EVALUATION CHALLENGE, 2014, 475 : 120 - 126
  • [4] A Rule-based Framework of Metadata Extraction from Scientific Papers
    Guo, Zhixin
    Jin, Hai
    [J]. 2011 TENTH INTERNATIONAL SYMPOSIUM ON DISTRIBUTED COMPUTING AND APPLICATIONS TO BUSINESS, ENGINEERING AND SCIENCE (DCABES), 2011, : 400 - 404
  • [5] Information Extraction of Extend Relation in Scientific Papers
    Sibaroni, Yuliant
    Widyantoro, Dwi H.
    Khodra, Masayu L.
    [J]. PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2016,
  • [6] Information Extraction from Research Papers by Data Integration and Data Validation from Multiple Header Extraction Sources
    Saleem, Ozair
    Latif, Seemab
    [J]. WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, WCECS 2012, VOL I, 2012, : 215 - 219
  • [7] Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers
    Patel, Krutarth
    Caragea, Cornelia
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1585 - 1591
  • [8] LINEEX: Data Extraction from Scientific Line Charts
    Shivasankaran, V. P.
    Hassan, Muhammad Yusuf
    Singh, Mayank
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 6202 - 6210
  • [9] Active Learning with Adaptive Density Weighted Sampling for Information Extraction from Scientific Papers
    Suvorov, Roman
    Shelmanov, Artem
    Smirnov, Ivan
    [J]. ARTIFICIAL INTELLIGENCE AND NATURAL LANGUAGE, 2018, 789 : 77 - 90
  • [10] Dissemination effect of data papers on scientific datasets
    Jiao, Hong
    Qiu, Yuhong
    Ma, Xiaowei
    Yang, Bo
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2024, 75 (02) : 115 - 131