Terminologies for text-mining;: an experiment in the lipoprotein metabolism domain

被引:13
|
作者
Alexopoulou, Dimitra [1 ]
Waechter, Thomas [1 ]
Pickersgill, Laura [2 ]
Eyre, Cecilia [3 ]
Schroeder, Michael [1 ]
机构
[1] Tech Univ Dresden, Ctr Biotechnol BIOTEC, D-01062 Dresden, Germany
[2] Unilever Corp Res, Colworth MK44 1LQ, England
[3] Unilever Safety & Environm Assurance Ctr, Colworth MK44 1LQ, England
关键词
D O I
10.1186/1471-2105-9-S4-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. Results: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. Conclusions: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. Availability: The TFIDF term recognition is available as Web Service, described at http://gopubmed4.biotec.tu-dresden.de/IdavollWebService/services/CandidateTermGeneratorService?wsdl.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] A Text-Mining Approach to Explain Unwanted Behaviours
    Chen, Wei
    Aspinall, David
    Gordon, Andrew D.
    Sutton, Charles
    Muttik, Igor
    PROCEEDINGS OF THE 9TH EUROPEAN WORKSHOP ON SYSTEM SECURITY, (EUROSEC 2016), 2016, : 19 - 24
  • [22] A TEXT-MINING APPROACH FOR CLASSIFICATION OF GENOMIC FRAGMENTS
    Gadia, Vinay
    Rosen, Gail
    2008 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, PROCEEDINGS, 2008, : 107 - 108
  • [23] Text-mining Approach for Estimating Vulnerability Score
    Miyamoto, Daisuke
    Yamamoto, Yasuhiro
    Nakayama, Masaya
    2015 4TH INTERNATIONAL WORKSHOP ON BUILDING ANALYSIS DATASETS AND GATHERING EXPERIENCE RETURNS FOR SECURITY (BADGERS), 2015, : 67 - 73
  • [24] @Minter: automated text-mining of microbial interactions
    Lim, Kun Ming Kenneth
    Li, Chenhao
    Chng, Kern Rei
    Nagarajan, Niranjan
    BIOINFORMATICS, 2016, 32 (19) : 2981 - 2987
  • [25] Current challenges in text-mining for chemical information
    Sayle, Roger
    Mayfield, John
    O'Boyle, Noel
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2019, 258
  • [26] Four Text-Mining Methods for Measuring Elaboration
    Dumas, Denis
    Organisciak, Peter
    Maio, Shannon
    Doherty, Michael
    JOURNAL OF CREATIVE BEHAVIOR, 2021, 55 (02): : 517 - 531
  • [27] Elsevier opens its papers to text-mining
    Van Noorden, Richard
    NATURE, 2014, 506 (7486) : 17 - 17
  • [28] Integration of text-mining and telemedicine appointment optimization
    Ji, Menglei
    Mosaffa, Mohammad
    Ardestani-Jaafari, Amir
    Li, Jinlin
    Peng, Chun
    ANNALS OF OPERATIONS RESEARCH, 2024, 341 (01) : 621 - 645
  • [29] ChemicalTagger: A tool for semantic text-mining in chemistry
    Hawizy, Lezan
    Jessop, Dave M.
    Murray-Rust, Peter
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2010, 240
  • [30] ChemicalTagger: A tool for semantic text-mining in chemistry
    Hawizy, Lezan
    Jessop, David M.
    Adams, Nico
    Murray-Rust, Peter
    JOURNAL OF CHEMINFORMATICS, 2011, 3