Supervised learning with word embeddings derived from PubMed captures latent knowledge about protein kinases and cancer

被引:3
|
作者
Ravanmehr, Vida [1 ]
Blau, Hannah [1 ]
Cappelletti, Luca [2 ]
Fontana, Tommaso [2 ]
Carmody, Leigh [1 ]
Coleman, Ben [1 ,3 ]
George, Joshy [1 ]
Reese, Justin [4 ]
Joachimiak, Marcin [4 ]
Bocci, Giovanni [5 ,6 ]
Hansen, Peter [1 ]
Bult, Carol [7 ]
Rueter, Jens [7 ]
Casiraghi, Elena [2 ]
Valentini, Giorgio [2 ]
Mungall, Christopher [4 ]
Oprea, Tudor, I [5 ,6 ]
Robinson, Peter N. [1 ,8 ]
机构
[1] Jackson Lab Genom Med, Farmington, CT 06032 USA
[2] Univ Milan, Dipartimento Informat, AnacletoLab, Milan, Italy
[3] Univ Connecticut Hlth Ctr, Dept Genet & Genome Sci, Farmington, CT 06030 USA
[4] Lawrence Berkeley Natl Lab, Div Environm Genom & Syst Biol, Berkeley, CA 94710 USA
[5] UNM Sch Med, Dept Internal Med, Albuquerque, NM 87102 USA
[6] UNM Sch Med, UNM Comprehens Canc Ctr, Albuquerque, NM 87102 USA
[7] Jackson Lab Mammalian Genet, Bar Harbor, ME 04609 USA
[8] Univ Connecticut, Inst Syst Genom, Farmington, CT 06032 USA
关键词
DRUG; DATABASE; TRIALS;
D O I
10.1093/nargab/lqab113
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Inhibiting protein kinases (PKs) that cause cancers has been an important topic in cancer therapy for years. So far, almost 8% of >530 PKs have been targeted by FDA-approved medications, and around 150 protein kinase inhibitors (PKIs) have been tested in clinical trials. We present an approach based on natural language processing and machine learning to investigate the relations between PKs and cancers, predicting PKs whose inhibition would be efficacious to treat a certain cancer. Our approach represents PKs and cancers as semantically meaningful 100-dimensional vectors based on word and concept neighborhoods in PubMed abstracts. We use information about phase I-IV trials in ClinicalTrials.gov to construct a training set for random forest classification. Our results with historical data show that associations between PKs and specific cancers can be predicted years in advance with good accuracy. Our tool can be used to predict the relevance of inhibiting PKs for specific cancers and to support the design of well-focused clinical trials to discover novel PKIs for cancer therapy.
引用
收藏
页数:13
相关论文
共 7 条
  • [1] Unsupervised word embeddings capture latent knowledge from materials science literature
    Vahe Tshitoyan
    John Dagdelen
    Leigh Weston
    Alexander Dunn
    Ziqin Rong
    Olga Kononova
    Kristin A. Persson
    Gerbrand Ceder
    Anubhav Jain
    Nature, 2019, 571 : 95 - 98
  • [2] Unsupervised word embeddings capture latent knowledge from materials science literature
    Tshitoyan, Vahe
    Dagdelen, John
    Weston, Leigh
    Dunn, Alexander
    Rong, Ziqin
    Kononova, Olga
    Persson, Kristin A.
    Ceder, Gerbrand
    Jain, Anubhav
    NATURE, 2019, 571 (7763) : 95 - +
  • [3] TwitPersonality: Computing Personality Traits from Tweets Using Word Embeddings and Supervised Learning
    Carducci, Giulio
    Rizzo, Giuseppe
    Monti, Diego
    Palumbo, Enrico
    Morisio, Maurizio
    INFORMATION, 2018, 9 (05)
  • [4] Learning Word Embeddings from Portuguese Lexical-Semantic Knowledge Bases
    Oliveira, Hugo Goncalo
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2018, 2018, 11122 : 265 - 271
  • [5] Beyond word embeddings: learning entity and concept representations from large scale knowledge bases
    Shalaby, Walid
    Zadrozny, Wlodek
    Jin, Hongxia
    INFORMATION RETRIEVAL JOURNAL, 2019, 22 (06): : 525 - 542
  • [6] Beyond word embeddings: learning entity and concept representations from large scale knowledge bases
    Walid Shalaby
    Wlodek Zadrozny
    Hongxia Jin
    Information Retrieval Journal, 2019, 22 : 525 - 542
  • [7] Improving the learning of chemical-protein interactions from literature using transfer learning and specialized word embeddings
    Corbett, P.
    Boyle, J.
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2018,