Predicting protein function via multi-label supervised topic model on gene ontology

被引:10
|
作者
Liu, Lin [1 ,2 ]
Tang, Lin [3 ]
He, Libo [1 ]
Yao, Shaowen [4 ]
Zhou, Wei [4 ]
机构
[1] Yunnan Univ, Sch Informat, Kunming, Yunnan, Peoples R China
[2] Yunnan Normal Univ, Sch Informat, Minist Educ, Key Lab Educ Informatizat Nationalities, Kunming, Yunnan, Peoples R China
[3] Yunnan Normal Univ, Key Lab Educ Informatizat Nationalities, Minist Educ, Kunming, Yunnan, Peoples R China
[4] Yunnan Univ, Natl Pilot Sch Software, Kunming, Yunnan, Peoples R China
基金
中国国家自然科学基金;
关键词
Topic modelling; protein function; gene ontology; multi-label classification; NETWORKS;
D O I
10.1080/13102818.2017.1307697
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
As the biological datasets accumulate rapidly, computational methods designed to automate protein function prediction are critically needed. The problem of protein function prediction can be considered as a multi-label classification problem resulting in protein functional annotations. Nevertheless, biologists prefer to discover the correlations between protein attributes and functions. We introduce a multi-label supervised topic model into protein function prediction and investigate the advantages of this approach. This topic model can not only work out the function probability distributions over protein instances effectively, but also directly provide the words probability distributions over functions. To the best of our knowledge, this is the first effort to apply a multi-label supervised topic model to the protein function prediction. In this paper, we model a protein as a document and a function label as a topic. First, a set of protein sequences is formalized into a bag of words. Then, we perform inference and estimate the model parameters to predict protein functions. Experimental results on yeast and human datasets demonstrate the effectiveness of this multi-label supervised topic model on protein function prediction. Meanwhile, the experiments also show that this multi-label supervised topic model delivers superior results over the compared algorithms. In summary, the method discussed in this paper provides a new efficient approach to protein function prediction and reveals more information about functions.
引用
收藏
页码:630 / 638
页数:9
相关论文
共 50 条
  • [41] Multi-label classification of gene function using MLPs
    Skabar, Andrew
    Wollersheim, Dennis
    Whitfort, Tim
    2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 2234 - +
  • [42] Feature Extraction of Deep Topic Model for Multi-label Text Classification
    Chen W.
    Liu X.
    Lu M.
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2019, 32 (09): : 785 - 792
  • [43] Subset Labeled LDA: A Topic Model for Extreme Multi-label Classification
    Papanikolaou, Yannis
    Tsoumakas, Grigorios
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2018), 2018, 11031 : 152 - 162
  • [44] Improved multi-label classifiers for predicting protein subcellular localization
    Chen, Lei
    Qu, Ruyun
    Liu, Xintong
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2024, 21 (01) : 214 - 236
  • [45] Deep Topic Models for Multi-label Learning
    Panda, Rajat
    Pensia, Ankit
    Mehta, Nikhil
    Zhou, Mingyuan
    Rai, Piyush
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [46] Predicting gene phenotype by multi-label multi-class model based on essential functional features
    Chen, Lei
    Li, Zhandong
    Zeng, Tao
    Zhang, Yu-Hang
    Li, Hao
    Huang, Tao
    Cai, Yu-Dong
    MOLECULAR GENETICS AND GENOMICS, 2021, 296 (04) : 905 - 918
  • [47] Predicting gene phenotype by multi-label multi-class model based on essential functional features
    Lei Chen
    Zhandong Li
    Tao Zeng
    Yu-Hang Zhang
    Hao Li
    Tao Huang
    Yu-Dong Cai
    Molecular Genetics and Genomics, 2021, 296 : 905 - 918
  • [48] Semi-supervised multi-label feature learning via label enlarged discriminant analysis
    Guo, Baolin
    Tao, Hong
    Hou, Chenping
    Yi, Dongyun
    KNOWLEDGE AND INFORMATION SYSTEMS, 2020, 62 (06) : 2383 - 2417
  • [49] Semi-supervised multi-label feature learning via label enlarged discriminant analysis
    Baolin Guo
    Hong Tao
    Chenping Hou
    Dongyun Yi
    Knowledge and Information Systems, 2020, 62 : 2383 - 2417
  • [50] Supervised representation learning for multi-label classification
    Ming Huang
    Fuzhen Zhuang
    Xiao Zhang
    Xiang Ao
    Zhengyu Niu
    Min-Ling Zhang
    Qing He
    Machine Learning, 2019, 108 : 747 - 763