Information Extraction from Research Papers by Data Integration and Data Validation from Multiple Header Extraction Sources

被引:0
|
作者
Saleem, Ozair [1 ]
Latif, Seemab [1 ]
机构
[1] NUST, Dept Comp Software Engn, Coll Telecommun Engn, Islamabad, Pakistan
关键词
Information Extraction; Header Extraction; Data Pre Processing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Massive amount of information is available on the web in form of Research paper publications. Extracting Header information like Conference Name, Title, Authors, Affiliation, Email, Keywords and Abstract can be very useful in performing data mining tasks like finding research trends in particular research area or finding collaboration done among different research groups or universities. Existing Header Parser tools identified by Yao et,al.[1] includes GROBID, ParsCit, Mendeley, HeaderParserService, PDFSSA4MET, PDFMEAT, Zotero and PaperPile. Tools using Machine Learning Algorithms include GROBID, ParsCit, Header Parser Service and Mendeley. Now the problem faced here is that no single tool gives 100% results against all sample research papers. One tool outperforms other in identifying individual elements. For this reason one cannot rely on single tool for all elements extraction. In this paper, we are proposing a hybrid method for the extraction of header information from the papers using GROBID, ParsCit and Mendeley. Results of these tools are merged to achieve accurate header extraction. This proposed method has been applied on 75 sample research papers and the overall accuracy of 95.97% is achieved.
引用
收藏
页码:215 / 219
页数:5
相关论文
共 50 条
  • [1] Ontology-based information extraction and integration from heterogeneous data sources
    Buitelaar, Paul
    Cimiano, Philipp
    Frank, Anette
    Hartung, Matthias
    Racloppa, Stefania
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2008, 66 (11) : 759 - 788
  • [2] QFL for the web data extraction from multiple data sources
    Borle, Shivani W.
    Potgantwar, A. D.
    [J]. 1ST INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION ICCUBEA 2015, 2015, : 432 - 436
  • [3] Data extraction from Web data sources
    Robinson, J
    [J]. 15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 282 - 288
  • [4] Scientific Data Extraction from Oceanographic Papers
    Veyhe, Bartal Eyofnsson
    Sagi, Tomer
    Hose, Katja
    [J]. COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 800 - 804
  • [5] Information extraction from narrative data
    不详
    [J]. AMERICAN JOURNAL OF HEALTH-SYSTEM PHARMACY, 2012, 69 (06) : 455 - +
  • [6] A Review: Information Extraction Techniques From Research Papers
    Jayaram, Kavitha
    Sangeeta, K.
    [J]. 2017 INTERNATIONAL CONFERENCE ON INNOVATIVE MECHANISMS FOR INDUSTRY APPLICATIONS (ICIMIA), 2017, : 56 - 59
  • [7] Feature extraction from multiple data sources using genetic programming
    Szymanski, JJ
    Brumby, SP
    Pope, P
    Eads, D
    Esch-Mosher, D
    Galassi, M
    Harvey, NR
    McCulloch, HDW
    Perkins, SJ
    Porter, R
    Theiler, J
    Young, AC
    Bloch, JJ
    David, N
    [J]. ALGORITHMS AND TECHNOLOGIES FOR MULTISPECTRAL, HYPERSPECTRAL, AND ULTRASPECTRAL IMAGERY VIII, 2002, 4725 : 338 - 345
  • [8] DATA QUALITY IN THE INTEGRATION AND ANALYSIS OF DATA FROM MULTIPLE SOURCES: SOME RESEARCH CHALLENGES
    Harding, J. L.
    [J]. 8TH INTERNATIONAL SYMPOSIUM ON SPATIAL DATA QUALITY, 2013, 40-2 (w1): : 59 - 63
  • [9] Information Extraction from Research Papers Based on Statistical Methods
    Kavila, Selvani Deepthi
    Rani, D. Fathima
    [J]. PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION TECHNOLOGIES, IC3T 2015, VOL 3, 2016, 381 : 573 - 580
  • [10] Information Extraction from Unstructured Recipe Data
    Silva, Nuno
    Ribeiro, David
    Ferreira, Liliana
    [J]. PROCEEDINGS OF THE 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTER AND TECHNOLOGY APPLICATIONS (ICCTA 2019), 2019, : 165 - 168