Information Extraction from Nanotoxicity Related Publications

被引:0
|
作者
Xiao, Lemin [1 ]
Tang, Kaizhi [1 ]
Liu, Xiong [1 ]
Yang, Hui [1 ]
Chen, Zheng [1 ]
Xu, Roger [1 ]
机构
[1] Intelligent Automat Inc, Rockville, MD 20855 USA
关键词
Nanoinformatics; information extraction; named entity recognition; relation extraction; nanotoxicity; data mining;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
High-quality experimental data are important when developing predictive models for studying nanomaterial environmental impact (NEI). Given that raw data from experimental laboratories and manufacturing workplaces are usually proprietary and small-scaled, extracting information from publications is an attractive alternative for collecting data. We developed an information extraction system that can extract useful information from full-text nanotoxicity related publications. This information extraction system consists of five components: raw data transformation into machine readable format, data preprocessing, ontology-based named entity recognition, rule-based numerical attribute extraction from both tables and unstructured text, and relation extraction among entities and attributes. The information extraction system is applied on a dataset made of 94 publications, and results in an acceptable accuracy. By storing extracted data into a table according to relations among the data, a dataset that can be used to predict nanomaterial environmental impact is obtained. Such a system is unique in current nanomaterial community, and can help nanomaterial scientists and practitioners quickly locate useful information they need without spending lots of time reading articles.
引用
收藏
页数:6
相关论文
共 50 条
  • [41] Semantic information generation from classification and information extraction
    Silva, TDS
    de Freitas, FLG
    Teske, RC
    Bittencourt, G
    WEB ENGINEERING, PROCEEDINGS, 2004, 3140 : 573 - 574
  • [42] CLARIFICATION OF RELATED PUBLICATIONS
    JASANI, MS
    NADKARNI, VM
    FINKELSTEIN, MS
    HOFMANN, WT
    SALZMAN, SK
    ACADEMIC EMERGENCY MEDICINE, 1995, 2 (02) : 156 - 157
  • [43] An annotated corpus of clinical trial publications supporting schema-based relational information extraction
    Olivia Sanchez-Graillet
    Christian Witte
    Frank Grimm
    Philipp Cimiano
    Journal of Biomedical Semantics, 13
  • [44] ExaCT: automatic extraction of clinical trial characteristics from journal publications
    Kiritchenko, Svetlana
    de Bruijn, Berry
    Carini, Simona
    Martin, Joel
    Sim, Ida
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2010, 10
  • [45] ExaCT: automatic extraction of clinical trial characteristics from journal publications
    Svetlana Kiritchenko
    Berry de Bruijn
    Simona Carini
    Joel Martin
    Ida Sim
    BMC Medical Informatics and Decision Making, 10
  • [46] MexPub: Deep Transfer Learning for Metadata Extraction from German Publications
    Boukhers, Zeyd
    Beili, Nada
    Hartmann, Timo
    Goswami, Prantik
    Zafar, Muhammad Arslan
    2021 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL 2021), 2021, : 250 - 253
  • [47] Knowledge-based extraction of intellectual capital-related information from unstructured data
    Tsui, Eric
    Wang, W. M.
    Cai, Linlin
    Cheung, C. F.
    Lee, W. B.
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (04) : 1315 - 1325
  • [48] INFORMATION EXTRACTION FROM CHEMICAL PATENTS
    Bergmann, Sandra
    Romberg, Mathilde
    Klenner, Alexander
    Zimmermann, Marc
    COMPUTER SCIENCE-AGH, 2012, 13 (02): : 21 - 32
  • [49] Information extraction from Greek texts
    Karra, M
    Bekakos, MP
    NEURAL, PARALLEL, AND SCIENTIFIC COMPUTATIONS, VOL 2, PROCEEDINGS, 2002, : 17 - 20
  • [50] Open Information Extraction from the Web
    Banko, Michele
    Cafarella, Michael J.
    Soderland, Stephen
    Broadhead, Matt
    Etzioni, Oren
    20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 2670 - 2676