Toward Robust Self-Training Paradigm for Molecular Prediction Tasks

被引:0
|
作者
Ma, Hehuan [1 ]
Jiang, Feng [1 ]
Rong, Yu [2 ]
Guo, Yuzhi [1 ]
Huang, Junzhou [1 ]
机构
[1] Univ Texas Arlington, Dept Comp Sci & Engn, Arlington, TX 76010 USA
[2] Tecent AI Lab, Shenzhen, Peoples R China
关键词
deep learning; molecular prediction tasks; semisupervised learning; PROTEIN SECONDARY STRUCTURE;
D O I
10.1089/cmb.2023.0187
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Molecular prediction tasks normally demand a series of professional experiments to label the target molecule, which suffers from the limited labeled data problem. One of the semisupervised learning paradigms, known as self-training, utilizes both labeled and unlabeled data. Specifically, a teacher model is trained using labeled data and produces pseudo labels for unlabeled data. These labeled and pseudo-labeled data are then jointly used to train a student model. However, the pseudo labels generated from the teacher model are generally not sufficiently accurate. Thus, we propose a robust self-training strategy by exploring robust loss function to handle such noisy labels in two paradigms, that is, generic and adaptive. We have conducted experiments on three molecular biology prediction tasks with four backbone models to gradually evaluate the performance of the proposed robust self-training strategy. The results demonstrate that the proposed method enhances prediction performance across all tasks, notably within molecular regression tasks, where there has been an average enhancement of 41.5%. Furthermore, the visualization analysis confirms the superiority of our method. Our proposed robust self-training is a simple yet effective strategy that efficiently improves molecular biology prediction performance. It tackles the labeled data insufficient issue in molecular biology by taking advantage of both labeled and unlabeled data. Moreover, it can be easily embedded with any prediction task, which serves as a universal approach for the bioinformatics community.
引用
收藏
页码:213 / 228
页数:16
相关论文
共 50 条
  • [1] Robust Self-training Strategy for Various Molecular Biology Prediction Tasks
    Ma, Hehuan
    Jiang, Feng
    Rong, Yu
    Guo, Yuzhi
    Huang, Junzhou
    [J]. 13TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY AND HEALTH INFORMATICS, BCB 2022, 2022,
  • [2] Doubly Robust Self-Training
    Zhu, Banghua
    Ding, Mingyu
    Jacobson, Philip
    Wu, Ming
    Zhan, Wei
    Jordan, Michael I.
    Jiao, Jiantao
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Discriminative Self-training for Punctuation Prediction
    Chen, Qian
    Wang, Wen
    Chen, Mengzhe
    Zhang, Qinglin
    [J]. INTERSPEECH 2021, 2021, : 771 - 775
  • [4] On the Effectiveness of Self-Training in MOOC Dropout Prediction
    Goel, Yamini
    Goyal, Rinkaj
    [J]. OPEN COMPUTER SCIENCE, 2020, 10 (01) : 246 - 258
  • [5] Interpolative self-training approach for link prediction
    Aghababaei, Somayyeh
    Makrehchi, Masoud
    [J]. INTELLIGENT DATA ANALYSIS, 2019, 23 (06) : 1379 - 1395
  • [6] Fast and Robust Self-Training Beard/Moustache Detection and Segmentation
    Le, T. Hoang Ngan
    Luu, Khoa
    Savvides, Marios
    [J]. 2015 INTERNATIONAL CONFERENCE ON BIOMETRICS (ICB), 2015, : 507 - 512
  • [7] Self-training ABS
    Akhmetshin, A.M.
    [J]. Avtomobil'naya Promyshlennost, 2001, (06): : 34 - 36
  • [9] Soil Temperature Prediction via Self-Training: Izmir Case
    Tuysuzoglu, Goksu
    Birant, Derya
    Kiranoglu, Volkan
    [J]. JOURNAL OF AGRICULTURAL SCIENCES-TARIM BILIMLERI DERGISI, 2022, 28 (01): : 47 - 62
  • [10] SETRED: Self-training with editing
    Li, M
    Zhou, ZH
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2005, 3518 : 611 - 621