A Flexible Approach for Extracting Metadata From Bibliographic Citations

被引:8
|
作者
Cortez, Eli [1 ]
da Silva, Altigran S. [1 ]
Goncalves, Marcos Andre [2 ]
Mesquita, Filipe [3 ]
de Moura, Edleno S. [3 ]
机构
[1] Univ Fed Amazonas, Dept Comp Sci, BR-69077000 Manaus, Amazonas, Brazil
[2] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
[3] Univ Fed Amazonas, Dept Comp Sci, Manaus, Amazonas, Brazil
关键词
INFORMATION EXTRACTION; WRAPPER INDUCTION; MODEL;
D O I
10.1002/asi.21049
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article we present FLUX-CiM, a novel method for extracting components (e.g., author names, article titles, venues, page numbers) from bibliographic citations. Our method does not rely on patterns encoding specific delimiters used in a particular citation style. This feature yields a high degree of automation and flexibility, and allows FLUX-CiM to extract from citations in any given format. Differently from previous methods that are based on models learned from user-driven training, our method relies on a knowledge base automatically constructed from an existing set of sample metadata records from a given field (e.g., computer science, health sciences, social sciences, etc.). These records are usually available on the Web or other public data repositories. To demonstrate the effectiveness and applicability of our proposed method, we present a series of experiments in which we apply it to extract bibliographic data from citations in articles of different fields. Results of these experiments exhibit precision and recall levels above 94% for all fields, and perfect extraction for the large majority of citations tested. In addition, in a comparison against a state-of-the-art information-extraction method, ours produced superior results without the training phase required by that method. Finally, we present a strategy for using bibliographic data resulting from the extraction process with FLUX-CiM to automatically update and expand the knowledge base of a given domain. We show that this strategy can be used to achieve good extraction results even if only a very small initial sample of bibliographic records is available for building the knowledge base.
引用
收藏
页码:1144 / 1158
页数:15
相关论文
共 50 条
  • [31] An Improved Algorithm for Extracting Research Communities from Bibliographic Data
    Nakamura, Yushi
    Horiike, Toshihiko
    Taira, Yoshimasa
    Sakamoto, Hiroshi
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, 2010, 6193 : 338 - 345
  • [32] Extracting Metadata from Multimedia Content on Facebook as Media Annotations
    Alves, Miguel B.
    Damasio, Carlos Viegas
    Correia, Nuno
    [J]. KNOWLEDGE ENGINEERING AND SEMANTIC WEB, KESW 2015, 2015, 518 : 243 - 252
  • [33] Extracting Metadata from Fundus Images for the Screening of Diabetic Retinopathy
    Hajdu, Andras
    Peto, Tuende
    Biro, Attila
    Harangozo, Roland
    Huelvely, Julianna
    Torok, Zsolt
    Csutak, Adrienne
    [J]. WISP 2009: 6TH IEEE INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING, PROCEEDINGS, 2009, : 259 - +
  • [34] Extracting the maximum from geographic metadata at the NASA Langley ASDC
    Olson, J
    Rowe, K
    Wang, F
    [J]. 18TH INTERNATIONAL CONFERENCE ON INTERACTIVE INFORMATION AND PROCESSING SYSTEMS (IIPS) FOR METEOROLOGY, OCEANOGRAPHY, AND HYDROLOGY, 2002, : 27 - 27
  • [35] Weights Estimation in the Completeness Measurement of Bibliographic Metadata
    Diaz de la Paz, Lisandra
    Riestra Collado, Francisco N.
    Garcia Mendoza, Juan L.
    Gonzalez Gonzalez, Luisa M.
    Leiva Mederos, Amed A.
    Taboada Crispi, Alberto
    [J]. COMPUTACION Y SISTEMAS, 2021, 25 (01): : 47 - 65
  • [36] Text Classification based on Limited Bibliographic Metadata
    Denecke, Kerstin
    Risse, Thomas
    Baehr, Thomas
    [J]. 2009 FOURTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT, 2009, : 241 - +
  • [37] Who Owns Bibliographic Metadata Created by Libraries?
    Machovec, George
    [J]. JOURNAL OF LIBRARY ADMINISTRATION, 2023, 63 (03) : 386 - 393
  • [38] Recording evidence in bibliographic records and descriptive metadata
    Taniguchi, S
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2005, 56 (08): : 872 - 882
  • [39] Bibliographic notes in the representation metadata of museum objects
    Alonso, Beatriz Tarre
    DE Barros, Camila M. O. N. T. E. I. R. O.
    [J]. SCIRE-REPRESENTACION Y ORGANIZACION DEL CONOCIMIENTO, 2023, 29 (01): : 25 - 30
  • [40] The bibliographic coupling approach to filter the cited and uncited patent citations: a case of electric vehicle technology
    Yeh, Hsi-Yin
    Sung, Yi-Shan
    Yang, Hsiao-Wen
    Tsai, Wan-Chu
    Chen, Dar-Zen
    [J]. SCIENTOMETRICS, 2013, 94 (01) : 75 - 93