A Flexible Approach for Extracting Metadata From Bibliographic Citations

被引:8
|
作者
Cortez, Eli [1 ]
da Silva, Altigran S. [1 ]
Goncalves, Marcos Andre [2 ]
Mesquita, Filipe [3 ]
de Moura, Edleno S. [3 ]
机构
[1] Univ Fed Amazonas, Dept Comp Sci, BR-69077000 Manaus, Amazonas, Brazil
[2] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
[3] Univ Fed Amazonas, Dept Comp Sci, Manaus, Amazonas, Brazil
关键词
INFORMATION EXTRACTION; WRAPPER INDUCTION; MODEL;
D O I
10.1002/asi.21049
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article we present FLUX-CiM, a novel method for extracting components (e.g., author names, article titles, venues, page numbers) from bibliographic citations. Our method does not rely on patterns encoding specific delimiters used in a particular citation style. This feature yields a high degree of automation and flexibility, and allows FLUX-CiM to extract from citations in any given format. Differently from previous methods that are based on models learned from user-driven training, our method relies on a knowledge base automatically constructed from an existing set of sample metadata records from a given field (e.g., computer science, health sciences, social sciences, etc.). These records are usually available on the Web or other public data repositories. To demonstrate the effectiveness and applicability of our proposed method, we present a series of experiments in which we apply it to extract bibliographic data from citations in articles of different fields. Results of these experiments exhibit precision and recall levels above 94% for all fields, and perfect extraction for the large majority of citations tested. In addition, in a comparison against a state-of-the-art information-extraction method, ours produced superior results without the training phase required by that method. Finally, we present a strategy for using bibliographic data resulting from the extraction process with FLUX-CiM to automatically update and expand the knowledge base of a given domain. We show that this strategy can be used to achieve good extraction results even if only a very small initial sample of bibliographic records is available for building the knowledge base.
引用
收藏
页码:1144 / 1158
页数:15
相关论文
共 50 条
  • [1] A multimodal approach for extracting content descriptive metadata from lecture videos
    Vidhya Balasubramanian
    Sooryanarayan Gobu Doraisamy
    Navaneeth Kumar Kanakarajan
    [J]. Journal of Intelligent Information Systems, 2016, 46 : 121 - 145
  • [2] A multimodal approach for extracting content descriptive metadata from lecture videos
    Balasubramanian, Vidhya
    Doraisamy, Sooryanarayan Gobu
    Kanakarajan, Navaneeth Kumar
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2016, 46 (01) : 121 - 145
  • [3] DFBICA: A New Distributed Approach For Sentiment Analysis of Bibliographic Citations
    ElAbdi, Mariem
    Smine, Boutheina
    Ben Yahia, Sadok
    [J]. 2018 12TH INTERNATIONAL CONFERENCE ON RESEARCH CHALLENGES IN INFORMATION SCIENCE (RCIS), 2018,
  • [4] An unsupervised heuristic-based approach for bibliographic metadata deduplication
    Borges, Eduardo N.
    de Carvalho, Moises G.
    Galante, Renata
    Goncalves, Marcos Andre
    Laender, Alberto H. F.
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (05) : 706 - 718
  • [5] An Automatic Approach for Duplicate Bibliographic Metadata Identification Using Classification
    Borges, Eduardo N.
    Becker, Karin
    Heuser, Carlos A.
    Galante, Renata
    [J]. 2011 30TH INTERNATIONAL CONFERENCE OF THE CHILEAN COMPUTER SCIENCE SOCIETY (SCCC), 2012, : 47 - 53
  • [6] AN INVESTIGATION OF THE VALIDITY OF BIBLIOGRAPHIC CITATIONS
    BROADUS, RN
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1983, 34 (02): : 132 - 135
  • [7] PROBLEM OF DATES IN BIBLIOGRAPHIC CITATIONS
    BROADUS, RN
    [J]. COLLEGE & RESEARCH LIBRARIES, 1968, 29 (05): : 387 - 392
  • [8] MACJa: Metadata and Citations Jailbreaker
    Nuzzolese, Andrea Giovanni
    Peroni, Silvio
    Recupero, Diego Reforgiato
    [J]. SEMANTIC WEB EVALUATION CHALLENGES, 2015, 548 : 117 - 128
  • [9] Extracting Provenance Metadata from Privacy Policies
    Pandit, Harshvardhan Jitendra
    O'Sullivan, Declan
    Lewis, Dave
    [J]. PROVENANCE AND ANNOTATION OF DATA AND PROCESSES, IPAW 2018, 2018, 11017 : 262 - 265
  • [10] Extracting metadata from biological experimental data
    Al-Daihani, Badr
    Gray, Alex
    Kille, Peter
    [J]. SEVENTEENTH INTERNATIONAL CONFERENCE ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2006, : 216 - +