Terminologies for text-mining;: an experiment in the lipoprotein metabolism domain

被引:13
|
作者
Alexopoulou, Dimitra [1 ]
Waechter, Thomas [1 ]
Pickersgill, Laura [2 ]
Eyre, Cecilia [3 ]
Schroeder, Michael [1 ]
机构
[1] Tech Univ Dresden, Ctr Biotechnol BIOTEC, D-01062 Dresden, Germany
[2] Unilever Corp Res, Colworth MK44 1LQ, England
[3] Unilever Safety & Environm Assurance Ctr, Colworth MK44 1LQ, England
关键词
D O I
10.1186/1471-2105-9-S4-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. Results: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. Conclusions: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. Availability: The TFIDF term recognition is available as Web Service, described at http://gopubmed4.biotec.tu-dresden.de/IdavollWebService/services/CandidateTermGeneratorService?wsdl.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Lightweight Search Engine Based on Text-Mining
    Liu, Chao
    Yin, Shi Qun
    Sun, Meng Meng
    Gao, Sheng
    FUZZY SYSTEM AND DATA MINING, 2016, 281 : 264 - 270
  • [32] USE OF TEXT-MINING TOOLS FOR SYSTEMATIC REVIEWS
    Paynter, R. A.
    Banez, L. L.
    Berliner, E.
    Erinoff, E.
    Lege-Matsuura, J. M.
    Potter, S.
    VALUE IN HEALTH, 2016, 19 (03) : A108 - A108
  • [33] Text-mining block prompts online response
    Mollie Bloudoff-Indelicato
    Nature, 2015, 527 (7579) : 413 - 413
  • [34] Combination of text-mining algorithms increases the performance
    Malik, Rainer
    Franke, Lude
    Siebes, Arno
    BIOINFORMATICS, 2006, 22 (17) : 2151 - 2157
  • [35] ChemicalTagger: A tool for semantic text-mining in chemistry
    Lezan Hawizy
    David M Jessop
    Nico Adams
    Peter Murray-Rust
    Journal of Cheminformatics, 3
  • [36] A Chain of Text-mining to Extract Information in Archaeology
    Amrani, Ahmed
    Abajian, Vicken
    Kodratoff, Yves
    Matte-Tailliez, Oriane
    2008 3RD INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGIES: FROM THEORY TO APPLICATIONS, VOLS 1-5, 2008, : 12 - +
  • [37] Comprehensive review of text-mining applications in finance
    Gupta, Aaryan
    Dengre, Vinya
    Kheruwala, Hamza Abubakar
    Shah, Manan
    FINANCIAL INNOVATION, 2020, 6 (01)
  • [38] Elsevier opens its papers to text-mining
    Richard Van Noorden
    Nature, 2014, 506 : 17 - 17
  • [39] Comprehensive review of text-mining applications in finance
    Aaryan Gupta
    Vinya Dengre
    Hamza Abubakar Kheruwala
    Manan Shah
    Financial Innovation, 6
  • [40] The future of food production ? a text-mining approach
    Bakhtin, Pavel
    Khabirova, Elena
    Kuzminov, Ilya
    Thurner, Thomas
    TECHNOLOGY ANALYSIS & STRATEGIC MANAGEMENT, 2020, 32 (05) : 516 - 528