Molecular profiling of thyroid cancer subtypes using large-scale text mining

被引:9
|
作者
Wu, Chengkun [1 ,2 ,3 ]
Schwartz, Jean-Marc [1 ]
Brabant, Georg [4 ,5 ]
Nenadic, Goran [3 ,6 ,7 ]
机构
[1] Univ Manchester, Fac Life Sci, Manchester M13 9PT, Lancs, England
[2] Univ Manchester, Doctoral Training Ctr Integrat Syst Biol, Manchester M1 7DN, Lancs, England
[3] Manchester Inst Biotechnol, Manchester M1 7DN, Lancs, England
[4] Univ Manchester, Christie Hosp, Dept Endocrinol, Manchester M20 4BX, Lancs, England
[5] Univ Lubeck, Expt & Clin Endocrinol, Med Clin 1, D-23538 Lubeck, Germany
[6] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[7] Hlth E Res Ctr HeRC, Manchester M13 9PL, Lancs, England
来源
BMC MEDICAL GENOMICS | 2014年 / 7卷
基金
英国生物技术与生命科学研究理事会;
关键词
SEARCH ENGINE; GENE; IDENTIFICATION; MANAGEMENT; EXTRACTION; EXPRESSION; LIBRARY; PATHWAY; SYSTEM;
D O I
10.1186/1755-8794-7-S3-S3
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background: Thyroid cancer is the most common endocrine tumor with a steady increase in incidence. It is classified into multiple histopathological subtypes with potentially distinct molecular mechanisms. Identifying the most relevant genes and biological pathways reported in the thyroid cancer literature is vital for understanding of the disease and developing targeted therapeutics. Results: We developed a large-scale text mining system to generate a molecular profiling of thyroid cancer subtypes. The system first uses a subtype classification method for the thyroid cancer literature, which employs a scoring scheme to assign different subtypes to articles. We evaluated the classification method on a gold standard derived from the PubMed Supplementary Concept annotations, achieving a micro-average F1-score of 85.9% for primary subtypes. We then used the subtype classification results to extract genes and pathways associated with different thyroid cancer subtypes and successfully unveiled important genes and pathways, including some instances that are missing from current manually annotated databases or most recent review articles. Conclusions: Identification of key genes and pathways plays a central role in understanding the molecular biology of thyroid cancer. An integration of subtype context can allow prioritized screening for diagnostic biomarkers and novel molecular targeted therapeutics.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Molecular profiling of thyroid cancer subtypes using large-scale text mining
    Chengkun Wu
    Jean-Marc Schwartz
    Georg Brabant
    Goran Nenadic
    [J]. BMC Medical Genomics, 7
  • [2] Constructing a molecular interaction network for thyroid cancer via large-scale text mining of gene and pathway events
    Wu, Chengkun
    Schwartz, Jean-Marc
    Brabant, Georg
    Peng, Shao-Liang
    Nenadic, Goran
    [J]. BMC SYSTEMS BIOLOGY, 2015, 9
  • [3] Large-Scale Text Mining of Biomedical Literature
    Ginter, Filip
    [J]. ELECTRONIC PROCEEDINGS IN THEORETICAL COMPUTER SCIENCE, 2013, (116): : 43 - 44
  • [4] Using Spark for Text Mining on Large Scale Liver Cancer Literature
    Lin, Ming-Yen
    Lin, Yu-Ju
    Hsueh, Sue-Chen
    [J]. 2021 THE 3RD INTERNATIONAL CONFERENCE ON BIG DATA ENGINEERING AND TECHNOLOGY, BDET 2021, 2021, : 82 - 87
  • [5] Mining Large-scale Event Knowledge from Web Text
    Cao, Ya-nan
    Zhang, Peng
    Guo, Jing
    Guo, Li
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, 2014, 29 : 478 - 487
  • [6] Causal Knowledge Extraction through Large-Scale Text Mining
    Hassanzadeh, Oktie
    Bhattacharjya, Debarun
    Feblowitz, Mark
    Srinivas, Kavitha
    Perrone, Michael
    Sohrabi, Shirin
    Katz, Michael
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 13610 - 13611
  • [7] The BioLexicon: a large-scale terminological resource for biomedical text mining
    Paul Thompson
    John McNaught
    Simonetta Montemagni
    Nicoletta Calzolari
    Riccardo del Gratta
    Vivian Lee
    Simone Marchi
    Monica Monachini
    Piotr Pezik
    Valeria Quochi
    CJ Rupp
    Yutaka Sasaki
    Giulia Venturi
    Dietrich Rebholz-Schuhmann
    Sophia Ananiadou
    [J]. BMC Bioinformatics, 12
  • [8] The BioLexicon: a large-scale terminological resource for biomedical text mining
    Thompson, Paul
    McNaught, John
    Montemagni, Simonetta
    Calzolari, Nicoletta
    del Gratta, Riccardo
    Lee, Vivian
    Marchi, Simone
    Monachini, Monica
    Pezik, Piotr
    Quochi, Valeria
    Rupp, C. J.
    Sasaki, Yutaka
    Venturi, Giulia
    Rebholz-Schuhmann, Dietrich
    Ananiadou, Sophia
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [9] Mining coherent topics in documents using word embeddings and large-scale text data
    Yao, Liang
    Zhang, Yin
    Chen, Qinfei
    Qian, Hongze
    Wei, Baogang
    Hu, Zhifeng
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2017, 64 : 432 - 439
  • [10] Interleaved Text/Image Deep Mining on a Large-Scale Radiology Database
    Shin, Hoo-Chang
    Lu, Le
    Kim, Lauren
    Seff, Ari
    Yao, Jianhua
    Summers, Ronald M.
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 1090 - 1099