Terminologies for text-mining;: an experiment in the lipoprotein metabolism domain

被引:13
|
作者
Alexopoulou, Dimitra [1 ]
Waechter, Thomas [1 ]
Pickersgill, Laura [2 ]
Eyre, Cecilia [3 ]
Schroeder, Michael [1 ]
机构
[1] Tech Univ Dresden, Ctr Biotechnol BIOTEC, D-01062 Dresden, Germany
[2] Unilever Corp Res, Colworth MK44 1LQ, England
[3] Unilever Safety & Environm Assurance Ctr, Colworth MK44 1LQ, England
关键词
D O I
10.1186/1471-2105-9-S4-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. Results: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. Conclusions: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. Availability: The TFIDF term recognition is available as Web Service, described at http://gopubmed4.biotec.tu-dresden.de/IdavollWebService/services/CandidateTermGeneratorService?wsdl.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Terminologies for text-mining; an experiment in the lipoprotein metabolism domain
    Dimitra Alexopoulou
    Thomas Wächter
    Laura Pickersgill
    Cecilia Eyre
    Michael Schroeder
    BMC Bioinformatics, 9
  • [2] Text-Mining and Neuroscience
    Ambert, Kyle H.
    Cohen, Aaron M.
    BIOINFORMATICS OF BEHAVIOR: PART 1, 2012, 103 : 109 - 132
  • [3] Text-Mining the Voice of the People
    Evangelopoulos, Nicholas
    Visinescu, Lucian
    COMMUNICATIONS OF THE ACM, 2012, 55 (02) : 55 - 62
  • [4] Maximizing text-mining performance
    Weiss, SM
    Apte, C
    Damerau, FJ
    Johnson, DE
    Oles, FJ
    Goetz, T
    Hampp, T
    IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1999, 14 (04): : 63 - 69
  • [5] Maximizing text-mining performance
    Weiss, Sholom M.
    Apte, Chidanand
    Damerau, Fred J.
    Johnson, David E.
    Oles, Frank J.
    Goetz, Thilo
    Hampp, Thomas
    IEEE Intelligent Systems and Their Applications, 14 (04): : 63 - 69
  • [6] Text-mining assisted regulatory annotation
    Aerts, Stein
    Haeussler, Maximilian
    van Vooren, Steven
    Griffith, Obi L.
    Hulpiau, Paco
    Jones, Steven J. M.
    Montgomery, Stephen B.
    Bergman, Casey M.
    GENOME BIOLOGY, 2008, 9 (02)
  • [7] Text-mining: Application development challenges
    Varadarajan, S
    Kasravi, K
    Feldman, R
    APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS X, 2003, : 247 - 260
  • [8] Text-mining based journal splitting
    Lin, XF
    SEVENTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, VOLS I AND II, PROCEEDINGS, 2003, : 1075 - 1079
  • [9] Text-mining in Terms of Methodology and Development
    Isaeva, Ekaterina
    Aldarova, Dinara
    PROCEEDINGS OF THE 2021 IEEE CONFERENCE OF RUSSIAN YOUNG RESEARCHERS IN ELECTRICAL AND ELECTRONIC ENGINEERING (ELCONRUS), 2021, : 413 - 416
  • [10] Text-mining assisted regulatory annotation
    Stein Aerts
    Maximilian Haeussler
    Steven van Vooren
    Obi L Griffith
    Paco Hulpiau
    Steven JM Jones
    Stephen B Montgomery
    Casey M Bergman
    Genome Biology, 9