A Rule-based Framework of Metadata Extraction from Scientific Papers

被引:2
|
作者
Guo, Zhixin [1 ]
Jin, Hai [1 ]
机构
[1] Huazhong Univ Sci & Technol, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab, Wuhan 430074, Peoples R China
关键词
document metadata; information extraction; rule-based approach;
D O I
10.1109/DCABES.2011.14
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Most scientific documents on the web are unstructured or semi-structured, and the automatic document metadata extraction process becomes an important task. This paper describes a framework for automatic metadata extraction from scientific papers. Based on a spatial and visual knowledge principle, our system can extract title, authors and abstract from scientific papers. We utilize format information such as font size and position to guide the metadata extraction process. The experiment results show that our system achieves a high accuracy in header metadata extraction which can effectively assist the automatic index creation for digital libraries.
引用
收藏
页码:400 / 404
页数:5
相关论文
共 50 条
  • [1] A framework for rule-based management of parallel scientific applications
    Liu, H
    Parashar, M
    [J]. ICAC 2005: Second International Conference on Autonomic Computing, Proceedings, 2005, : 360 - 361
  • [2] Odinson: A Fast Rule-based Information Extraction Framework
    Valenzuela-Escarcega, Marco A.
    Hahn-Powell, Gus
    Bell, Dane
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2183 - 2191
  • [3] A Hybrid Case-based and Rule-based for Metadata Extraction on Heterogeneous Thai Documents
    Khankasikam, Krisda
    [J]. 2010 2ND INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING (ICCAE 2010), VOL 1, 2010, : 312 - 317
  • [4] A Domain-independent Rule-based Framework for Event Extraction
    Valenzuela-Escarcega, Marco A.
    Hahn-Powell, Gus
    Hicks, Thomas
    Surdeanu, Mihai
    [J]. PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2015): SYSTEM DEMONSTRATIONS, 2015, : 127 - 132
  • [5] Rule-based scene extraction from video
    Chen, L
    Özsu, MT
    [J]. 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 737 - 740
  • [6] A Rule-Based Agent Framework for Weakly-Structured Scientific Workflows
    Zhao, Zhili
    Paschke, Adrian
    [J]. BUSINESS INFORMATION SYSTEMS WORKSHOPS, BIS 2013, 2013, 160 : 290 - 301
  • [7] Rule-based metadata interoperation in heterogeneous digital libraries
    Ding, Hao
    Solvberg, Ingeborg
    [J]. ELECTRONIC LIBRARY, 2007, 25 (02): : 193 - 206
  • [8] A high-precision rule-based extraction system for expanding geospatial metadata in GenBank records
    Tahsin, Tasnia
    Weissenbacher, Davy
    Rivera, Robert
    Beard, Rachel
    Firago, Mari
    Wallstrom, Garrick
    Scotch, Matthew
    Gonzalez, Graciela
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (05) : 934 - 941
  • [9] DIOS++: A framework for rule-based autonomic management of distributed scientific applications
    Liu, H
    Parashar, M
    [J]. EURO-PAR 2003 PARALLEL PROCESSING, PROCEEDINGS, 2003, 2790 : 66 - 73
  • [10] An approach to rule-based knowledge extraction
    Jin, YC
    von Seelen, W
    Sendhoff, B
    [J]. 1998 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AT THE IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE - PROCEEDINGS, VOL 1-2, 1998, : 1188 - 1193