Active Learning algorithm for Threshold of Decision Probability on Imbalanced Text Classification based on Protein-Protein Interaction Documents

被引:3
|
作者
Xu, Guixian [1 ,2 ]
Niu, Zhendong [1 ]
Gao, Xu [3 ]
Cao, Yujuan [1 ]
Zhao, Yumin [1 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci, Beijing, Peoples R China
[2] Minzu Univ, Coll Informat Engn, Beijing, Peoples R China
[3] North China Grid Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
imbalanced text classification; machine learning; protein-protein interaction;
D O I
10.1109/DSDE.2010.28
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The study of host pathogen protein-protein interactions (PPIs) is essential to understand the disease-causing mechanisms of human pathogens. A large number of scientific findings about PPIs are generated in the biomedical literatures. Building a document classification system can accelerate the process of mining and curation of PPI knowledge. With more and more imbalanced dataset appearing, how to handle the imbalanced classification problem is becoming a hot topic in machine learning field. In this paper, we propose an Active Learning algorithm for Threshold of Decision Probability (ALTDP) to solve problem of misclassifying the minority class based on imbalanced host pathogen PPIs data set. The results demonstrate the proposed approach is significant to improve the accuracy of classification on imbalanced data set.
引用
收藏
页码:78 / 82
页数:5
相关论文
共 50 条
  • [41] Protein-Protein Interaction Interface Residue Pair Prediction Based on Deep Learning Architecture
    Zhao, Zhenni
    Gong, Xinqi
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (05) : 1753 - 1759
  • [42] Structure-Based Approaches for Protein-Protein Interaction Prediction Using Machine Learning and Deep Learning
    Kiouri, Despoina P.
    Batsis, Georgios C.
    Chasapis, Christos T.
    BIOMOLECULES, 2025, 15 (01)
  • [43] A matrix based algorithm for protein-protein interaction prediction using domain-domain associations
    Priya, S. Binny
    Saha, Subhojit
    Anishetty, Ramesh
    Anishetty, Sharmila
    JOURNAL OF THEORETICAL BIOLOGY, 2013, 326 : 36 - 42
  • [44] An Effective Link-Based Clustering Algorithm for Detecting Overlapping Protein Complexes in Protein-Protein Interaction Networks
    Hu, Lun
    Zhang, Jun
    Pan, Xiangyu
    Luo, Xin
    Yuan, Huaqiang
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2021, 8 (04): : 3275 - 3289
  • [45] The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text
    Krallinger, Martin
    Vazquez, Miguel
    Leitner, Florian
    Salgado, David
    Chatr-aryamontri, Andrew
    Winter, Andrew
    Perfetto, Livia
    Briganti, Leonardo
    Licata, Luana
    Iannuccelli, Marta
    Castagnoli, Luisa
    Cesareni, Gianni
    Tyers, Mike
    Schneider, Gerold
    Rinaldi, Fabio
    Leaman, Robert
    Gonzalez, Graciela
    Matos, Sergio
    Kim, Sun
    Wilbur, W. John
    Rocha, Luis
    Shatkay, Hagit
    Tendulkar, Ashish V.
    Agarwal, Shashank
    Liu, Feifan
    Wang, Xinglong
    Rak, Rafal
    Noto, Keith
    Elkan, Charles
    Lu, Zhiyong
    Dogan, Rezarta Islamaj
    Fontaine, Jean-Fred
    Andrade-Navarro, Miguel A.
    Valencia, Alfonso
    BMC BIOINFORMATICS, 2011, 12
  • [46] The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text
    Martin Krallinger
    Miguel Vazquez
    Florian Leitner
    David Salgado
    Andrew Chatr-aryamontri
    Andrew Winter
    Livia Perfetto
    Leonardo Briganti
    Luana Licata
    Marta Iannuccelli
    Luisa Castagnoli
    Gianni Cesareni
    Mike Tyers
    Gerold Schneider
    Fabio Rinaldi
    Robert Leaman
    Graciela Gonzalez
    Sergio Matos
    Sun Kim
    W John Wilbur
    Luis Rocha
    Hagit Shatkay
    Ashish V Tendulkar
    Shashank Agarwal
    Feifan Liu
    Xinglong Wang
    Rafal Rak
    Keith Noto
    Charles Elkan
    Zhiyong Lu
    Rezarta Islamaj Dogan
    Jean-Fred Fontaine
    Miguel A Andrade-Navarro
    Alfonso Valencia
    BMC Bioinformatics, 12
  • [47] DL-PPI: a method on prediction of sequenced protein-protein interaction based on deep learning
    Wu, Jiahui
    Liu, Bo
    Zhang, Jidong
    Wang, Zhihan
    Li, Jianqiang
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [48] Current Status of Machine Learning-Based Methods for Identifying Protein-Protein Interaction Sites
    Wang, Bing
    Sun, Wenlong
    Zhang, Jun
    Chen, Peng
    CURRENT BIOINFORMATICS, 2013, 8 (02) : 177 - 182
  • [49] Detecting disease genes based on semi-supervised learning and protein-protein interaction networks
    Thanh-Phuong Nguyen
    Tu-Bao Ho
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2012, 54 (01) : 63 - 71
  • [50] DeepSG2PPI: A Protein-Protein Interaction Prediction Method Based on Deep Learning
    Zhang, Fan
    Zhang, Yawei
    Zhu, Xiaoke
    Chen, Xiaopan
    Lu, Fuhao
    Zhang, Xinhong
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 2907 - 2919