Terminologies for text-mining;: an experiment in the lipoprotein metabolism domain

被引:13
|
作者
Alexopoulou, Dimitra [1 ]
Waechter, Thomas [1 ]
Pickersgill, Laura [2 ]
Eyre, Cecilia [3 ]
Schroeder, Michael [1 ]
机构
[1] Tech Univ Dresden, Ctr Biotechnol BIOTEC, D-01062 Dresden, Germany
[2] Unilever Corp Res, Colworth MK44 1LQ, England
[3] Unilever Safety & Environm Assurance Ctr, Colworth MK44 1LQ, England
关键词
D O I
10.1186/1471-2105-9-S4-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The engineering of ontologies, especially with a view to a text-mining use, is still a new research field. There does not yet exist a well-defined theory and technology for ontology construction. Many of the ontology design steps remain manual and are based on personal experience and intuition. However, there exist a few efforts on automatic construction of ontologies in the form of extracted lists of terms and relations between them. Results: We share experience acquired during the manual development of a lipoprotein metabolism ontology (LMO) to be used for text-mining. We compare the manually created ontology terms with the automatically derived terminology from four different automatic term recognition (ATR) methods. The top 50 predicted terms contain up to 89% relevant terms. For the top 1000 terms the best method still generates 51% relevant terms. In a corpus of 3066 documents 53% of LMO terms are contained and 38% can be generated with one of the methods. Conclusions: Given high precision, automatic methods can help decrease development time and provide significant support for the identification of domain-specific vocabulary. The coverage of the domain vocabulary depends strongly on the underlying documents. Ontology development for text mining should be performed in a semi-automatic way; taking ATR results as input and following the guidelines we described. Availability: The TFIDF term recognition is available as Web Service, described at http://gopubmed4.biotec.tu-dresden.de/IdavollWebService/services/CandidateTermGeneratorService?wsdl.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Text Mining of Journal Articles for Sleep Disorder Terminologies
    Lam, Calvin
    Lai, Fu-Chih
    Wang, Chia-Hui
    Lai, Mei-Hsin
    Hsu, Nanly
    Chung, Min-Huey
    PLOS ONE, 2016, 11 (05):
  • [42] A Review on Electronic Health Record Text-Mining for Biomedical Name Entity Recognition in Healthcare Domain
    Ahmad, Pir Noman
    Shah, Adnan Muhammad
    Lee, KangYoon
    HEALTHCARE, 2023, 11 (09)
  • [43] Green IT Practices across Industries: A Text-Mining based
    Deng, Qi
    Ji, Shaobo
    Wang, Yun
    AMCIS 2017 PROCEEDINGS, 2017,
  • [44] Drug repurposing: A bibliometric analysis by text-mining PubMed
    Baker, Nancy
    Ekins, Sean
    Williams, Antony
    Tropsha, Alexander
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2017, 253
  • [45] A Text-Mining System for Concept Annotation in Biomedical Full Text Articles
    Wei, Chih-Hsuan
    Allot, Alexis
    Leaman, Robert
    Lu, Zhiyong
    ACM-BCB'19: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, 2019, : 540 - 540
  • [46] Mapping knowledge landscapes and research frontiers of gastrointestinal microbiota and bone metabolism: a text-mining study
    Wu, Haiyang
    Sun, Zaijie
    Guo, Qiang
    Li, Cheng
    FRONTIERS IN CELLULAR AND INFECTION MICROBIOLOGY, 2024, 14
  • [47] Text-mining approach to evaluate terms for ontology development
    Tsoi, Lam C.
    Patel, Ravi
    Zhao, Wenle
    Zheng, W. Jim
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 824 - 830
  • [48] New Challenges for Biological Text-Mining in the Next Decade
    Dai, Hong-Jie
    Chang, Yen-Ching
    Tsai, Richard Tzong-Han
    Hsu, Wen-Lian
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2010, 25 (01) : 169 - +
  • [49] Text-mining Similarity Approximation Operators for Opinion Mining in BI tools
    Kaplanski, Pawel
    Rizun, Nina
    Taranenko, Yurii
    Seganti, Alessandro
    PROCEEDINGS OF THE 11TH SCIENTIFIC CONFERENCE INTERNET IN THE INFORMATION SOCIETY 2016, 2016, : 121 - 140
  • [50] Assessing manufacturing strategy definitions utilising text-mining
    Kulkarni, Sourabh
    Verma, Priyanka
    Mukundan, R.
    INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2019, 57 (14) : 4519 - 4546