ChemicalTagger: A tool for semantic text-mining in chemistry

被引:113
|
作者
Hawizy, Lezan [1 ]
Jessop, David M. [1 ]
Adams, Nico [2 ]
Murray-Rust, Peter [1 ]
机构
[1] Univ Cambridge, Dept Chem, Unilever Ctr Mol Sci Informat, Cambridge CB2 1EW, England
[2] European Bioinformat Inst, Cambridge CB10 1SD, England
来源
关键词
WEB; XML;
D O I
10.1186/1758-2946-3-17
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: The primary method for scientific communication is in the form of published scientific articles and theses which use natural language combined with domain-specific terminology. As such, they contain free owing unstructured text. Given the usefulness of data extraction from unstructured literature, we aim to show how this can be achieved for the discipline of chemistry. The highly formulaic style of writing most chemists adopt make their contributions well suited to high-throughput Natural Language Processing (NLP) approaches. Results: We have developed the ChemicalTagger parser as a medium-depth, phrase-based semantic NLP tool for the language of chemical experiments. Tagging is based on a modular architecture and uses a combination of OSCAR, domain-specific regex and English taggers to identify parts-of-speech. The ANTLR grammar is used to structure this into tree-based phrases. Using a metric that allows for overlapping annotations, we achieved machine-annotator agreements of 88.9% for phrase recognition and 91.9% for phrase-type identification (Action names). Conclusions: It is possible parse to chemical experimental text using rule-based techniques in conjunction with a formal grammar parser. ChemicalTagger has been deployed for over 10,000 patents and has identified solvents from their linguistic context with >99.5% precision.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Combination of text-mining algorithms increases the performance
    Malik, Rainer
    Franke, Lude
    Siebes, Arno
    [J]. BIOINFORMATICS, 2006, 22 (17) : 2151 - 2157
  • [42] A Chain of Text-mining to Extract Information in Archaeology
    Amrani, Ahmed
    Abajian, Vicken
    Kodratoff, Yves
    Matte-Tailliez, Oriane
    [J]. 2008 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES: FROM THEORY TO APPLICATIONS, VOLS 1-5, 2008, : 12 - +
  • [43] Text-mining block prompts online response
    Mollie Bloudoff-Indelicato
    [J]. Nature, 2015, 527 (7579) : 413 - 413
  • [44] USE OF TEXT-MINING TOOLS FOR SYSTEMATIC REVIEWS
    Paynter, R. A.
    Banez, L. L.
    Berliner, E.
    Erinoff, E.
    Lege-Matsuura, J. M.
    Potter, S.
    [J]. VALUE IN HEALTH, 2016, 19 (03) : A108 - A108
  • [45] Lightweight Search Engine Based on Text-Mining
    Liu, Chao
    Yin, Shi Qun
    Sun, Meng Meng
    Gao, Sheng
    [J]. FUZZY SYSTEM AND DATA MINING, 2016, 281 : 264 - 270
  • [46] Comprehensive review of text-mining applications in finance
    Aaryan Gupta
    Vinya Dengre
    Hamza Abubakar Kheruwala
    Manan Shah
    [J]. Financial Innovation, 6
  • [47] Comprehensive review of text-mining applications in finance
    Gupta, Aaryan
    Dengre, Vinya
    Kheruwala, Hamza Abubakar
    Shah, Manan
    [J]. FINANCIAL INNOVATION, 2020, 6 (01)
  • [48] The future of food production ? a text-mining approach
    Bakhtin, Pavel
    Khabirova, Elena
    Kuzminov, Ilya
    Thurner, Thomas
    [J]. TECHNOLOGY ANALYSIS & STRATEGIC MANAGEMENT, 2020, 32 (05) : 516 - 528
  • [49] Elsevier opens its papers to text-mining
    Richard Van Noorden
    [J]. Nature, 2014, 506 : 17 - 17
  • [50] TWITTER AS A POLITICAL TOOL IN EU COUNTRIES DURING THE ECONOMIC CRISIS: A COMPARATIVE TEXT-MINING ANALYSIS
    Redek, Tjasa
    Godnov, Uros
    [J]. DRUSTVENA ISTRAZIVANJA, 2018, 27 (04): : 691 - 711