A Semi-supervised Learning Approach for Pan-Cancer Somatic Genomic Variant Classification

被引:3
|
作者
Nicora, Giovanna [1 ]
Marini, Simone [2 ]
Limongelli, Ivan [3 ]
Rizzo, Ettore [3 ]
Montoli, Stefano [1 ]
Tricomi, Francesca Floriana [1 ]
Bellazzi, Riccardo [1 ]
机构
[1] Univ Pavia, Dept Elect Comp & Biomed Engn, Via Ferrata 1, I-27100 Pavia, Italy
[2] Univ Michigan, Dept Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[3] enGenome Srl, Via Ferrata 1, I-27100 Pavia, Italy
关键词
Somatic variant classification; Semi-supervised learning; Autoencoder;
D O I
10.1007/978-3-030-21642-9_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cancer arises from the accumulation of particular somatic genomic variants known as drivers. New sequencing technologies allow the identification of hundreds of variants in a tumor sample. These variations should be classified as driver or passenger (i.e. benign), but functional studies could be time and cost demanding. Therefore, in the bioinformatics field, machine learning methods are widely applied to distinguish drivers from passengers. Recent projects, such as the AACR GENIE, provide an unprecedented amount of cancer data that could be exploited for the training process of machine learning algorithms. However, the majority of these variants are not yet classified. The development and application of approaches able to assimilate unlabeled data are needed in order to fully benefit from the available omics-resources. We collected and annotated a dataset of known 976 driver and over 84,000 passengers from different databases and we investigated whether unclassified variants from GENIE could be employed in the classification process. We characterized each variant by 94 features from multiple omics resources. We therefore trained different autoencoder architectures with more than 80000 GENIE variants. Autoencoder is a type of neural network able to learn a new features representation of the input data in an unsupervised manner. The trained autoencoders are then used to obtain new representations of the labeled dataset, with a reduced number of meta-features with the aim to reduce redundancy and extract the relevant information. The new representations are in turn exploited to train and test different machine learning techniques, such as Random Forest, Support Vector Machine, Ridge Logistic Regression, One Class SVM. Final results, however, does not show a significant increase in classification ability when meta-features are used.
引用
收藏
页码:42 / 46
页数:5
相关论文
共 50 条
  • [1] Semi-supervised learning for somatic variant calling and peptide identification in personalized cancer immunotherapy
    Sherafat, Elham
    Force, Jordan
    Mandoiu, Ion I.
    [J]. BMC BIOINFORMATICS, 2020, 21 (Suppl 18)
  • [2] Semi-supervised learning for somatic variant calling and peptide identification in personalized cancer immunotherapy
    Elham Sherafat
    Jordan Force
    Ion I. Măndoiu
    [J]. BMC Bioinformatics, 21
  • [3] A Semi-supervised Learning Approach for Microblog Sentiment Classification
    Yu, Zhiwei
    Wong, Raymond K.
    Chi, Chi-Hung
    Chen, Fang
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON SMART CITY/SOCIALCOM/SUSTAINCOM (SMARTCITY), 2015, : 339 - 344
  • [4] A collective learning approach for semi-supervised data classification
    Uylas Sati, Nur
    [J]. PAMUKKALE UNIVERSITY JOURNAL OF ENGINEERING SCIENCES-PAMUKKALE UNIVERSITESI MUHENDISLIK BILIMLERI DERGISI, 2018, 24 (05): : 864 - 869
  • [5] An Incremental Broad Learning Approach for Semi-Supervised Classification
    Liu, Xize
    Qiu, Tie
    Chen, Chen
    Ning, Huansheng
    Chen, Ning
    [J]. IEEE 17TH INT CONF ON DEPENDABLE, AUTONOM AND SECURE COMP / IEEE 17TH INT CONF ON PERVAS INTELLIGENCE AND COMP / IEEE 5TH INT CONF ON CLOUD AND BIG DATA COMP / IEEE 4TH CYBER SCIENCE AND TECHNOLOGY CONGRESS (DASC/PICOM/CBDCOM/CYBERSCITECH), 2019, : 250 - 254
  • [6] Augmentation Learning for Semi-Supervised Classification
    Frommknecht, Tim
    Zipf, Pedro Alves
    Fan, Quanfu
    Shvetsova, Nina
    Kuehne, Hilde
    [J]. PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 85 - 98
  • [7] Semi-Supervised Learning for ECG Classification
    Rodrigues, Rui
    Couto, Paula
    [J]. 2021 COMPUTING IN CARDIOLOGY (CINC), 2021,
  • [8] A federated semi-supervised learning approach for network traffic classification
    Jin, Zhiping
    Liang, Zhibiao
    He, Meirong
    Peng, Yao
    Xue, Hanxiao
    Wang, Yu
    [J]. INTERNATIONAL JOURNAL OF NETWORK MANAGEMENT, 2023, 33 (03)
  • [9] Semi-Supervised Learning for Classification with Uncertainty
    Zhang, Rui
    Liu, Tong-bo
    Zheng, Ming-wen
    [J]. MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 3584 - 3590
  • [10] A semi-supervised learning approach for bladder cancer grading
    Wenger, Kenneth
    Tirdad, Kayvan
    Cruz, Alex Dela
    Mari, Andrea
    Basheer, Mayada
    Kuk, Cynthia
    van Rhijn, Bas W. G.
    Zlotta, Alexandre R.
    van der Kwast, Theodorus H.
    Sadeghian, Alireza
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2022, 9