Mining biomarker information in biomedical literature

被引:32
|
作者
Younesi, Erfan [1 ,2 ]
Toldo, Luca [3 ]
Mueller, Bernd [1 ]
Friedrich, Christoph M. [1 ,5 ]
Novac, Natalia [3 ]
Scheer, Alexander [4 ]
Hofmann-Apitius, Martin [1 ,2 ]
Fluck, Juliane [1 ]
机构
[1] Fraunhofer Inst Algorithms & Sci Comp SCAI, Dept Bioinformat, D-53754 St Augustin, Germany
[2] Univ Bonn, Bonn Aachen Int Ctr Informat Technol B IT, Bonn, Germany
[3] Merck KGaA, Merck Serono, Operat Excellence & Site Coordinat, Knowledge Management, Darmstadt, Germany
[4] Merck KGaA, Merck Serono, Informat & Knowledge Management, Geneva, Switzerland
[5] Univ Appl Sci & Arts, Dept Comp Sci, Dortmund, Germany
关键词
Text-mining; Biomarker discovery; Information retrieval; Terminology; DISCOVERY; CANCER;
D O I
10.1186/1472-6947-12-148
中图分类号
R-058 [];
学科分类号
摘要
Background: For selection and evaluation of potential biomarkers, inclusion of already published information is of utmost importance. In spite of significant advancements in text-and data-mining techniques, the vast knowledge space of biomarkers in biomedical text has remained unexplored. Existing named entity recognition approaches are not sufficiently selective for the retrieval of biomarker information from the literature. The purpose of this study was to identify textual features that enhance the effectiveness of biomarker information retrieval for different indication areas and diverse end user perspectives. Methods: A biomarker terminology was created and further organized into six concept classes. Performance of this terminology was optimized towards balanced selectivity and specificity. The information retrieval performance using the biomarker terminology was evaluated based on various combinations of the terminology's six classes. Further validation of these results was performed on two independent corpora representing two different neurodegenerative diseases. Results: The current state of the biomarker terminology contains 119 entity classes supported by 1890 different synonyms. The result of information retrieval shows improved retrieval rate of informative abstracts, which is achieved by including clinical management terms and evidence of gene/protein alterations (e.g. gene/protein expression status or certain polymorphisms) in combination with disease and gene name recognition. When additional filtering through other classes (e.g. diagnostic or prognostic methods) is applied, the typical high number of unspecific search results is significantly reduced. The evaluation results suggest that this approach enables the automated identification of biomarker information in the literature. A demo version of the search engine SCAIView, including the biomarker retrieval, is made available to the public through http://www.scaiview.com/scaiviewacademia.html. Conclusions: The approach presented in this paper demonstrates that using a dedicated biomarker terminology for automated analysis of the scientific literature maybe helpful as an aid to finding biomarker information in text. Successful extraction of candidate biomarkers information from published resources can be considered as the first step towards developing novel hypotheses. These hypotheses will be valuable for the early decision-making in the drug discovery and development process.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Mining biomarker information in biomedical literature
    Erfan Younesi
    Luca Toldo
    Bernd Müller
    Christoph M Friedrich
    Natalia Novac
    Alexander Scheer
    Martin Hofmann-Apitius
    Juliane Fluck
    [J]. BMC Medical Informatics and Decision Making, 12
  • [2] Mining gene-related information from biomedical literature
    Tudor, Catalina O.
    Vijay-Shanker, K.
    Schmidt, Carl J.
    [J]. BIBMW: 2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOP, 2009, : 335 - 335
  • [3] Incorporating Zoning Information into Argument Mining from Biomedical Literature
    Liu, Boyang
    Schlegel, Viktor
    Batista-Navarro, Riza
    Ananiadou, Sophia
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6162 - 6169
  • [4] Biomedical literature mining
    Hu, Xiaohua
    [J]. PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 1446 - 1446
  • [5] Text mining the biomedical literature
    Pertsemlidis, A
    [J]. BIOPHYSICAL JOURNAL, 2002, 82 (01) : 168A - 168A
  • [6] Recent advances in biomedical literature mining
    Zhao, Sendong
    Su, Chang
    Lu, Zhiyong
    Wang, Fei
    [J]. BRIEFINGS IN BIOINFORMATICS, 2021, 22 (03)
  • [7] A statistical framework for biomedical literature mining
    Chung, Dongjun
    Lawson, Andrew
    Zheng, W. Jim
    [J]. STATISTICS IN MEDICINE, 2017, 36 (22) : 3461 - 3474
  • [8] Mining and modeling linkage information from citation context for improving biomedical literature retrieval
    Yin, Xiaoshi
    Huang, Jimmy Xiangji
    Li, Zhoujun
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2011, 47 (01) : 53 - 67
  • [9] Mining Biomedical Texts for Pediatric Information
    Yun, Tian
    Garg, Deepti
    Khuri, Natalia
    [J]. PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, VOL 3: BIOINFORMATICS, 2021, : 60 - 71
  • [10] Web-Based Biomedical Literature Mining
    安建福
    薛惠平
    陈瑛
    吴建国
    章鲁
    [J]. Journal of Shanghai Jiaotong University(Science), 2012, 17 (04) : 494 - 499