Multi-label literature classification based on the Gene Ontology graph

被引:17
|
作者
Jin, Bo [1 ]
Muller, Brian [1 ]
Zhai, Chengxiang [2 ]
Lu, Xinghua [1 ]
机构
[1] Med Univ S Carolina, Dept Biostat Bioinformat & Epidemiol, Charleston, SC 29425 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
关键词
D O I
10.1186/1471-2105-9-525
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results: In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion: Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Scene-Aware Label Graph Learning for Multi-Label Image Classification
    Zhu, Xuelin
    Liu, Jian
    Liu, Weijia
    Ge, Jiawei
    Liu, Bo
    Cao, Jiuxin
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 1473 - 1482
  • [32] Label-representative graph convolutional network for multi-label text classification
    Huy-The Vu
    Minh-Tien Nguyen
    Van-Chien Nguyen
    Minh-Hieu Pham
    Van-Quyet Nguyen
    Van-Hau Nguyen
    [J]. APPLIED INTELLIGENCE, 2023, 53 (12) : 14759 - 14774
  • [33] Label-aware graph representation learning for multi-label image classification
    Chen, Yilu
    Zou, Changzhong
    Chen, Jianli
    [J]. NEUROCOMPUTING, 2022, 492 : 50 - 61
  • [34] A multi-label classification based approach for sentiment classification
    Liu, Shuhua Monica
    Chen, Jiun-Hung
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (03) : 1083 - 1093
  • [35] Semi-supervised Graph Embedding for Multi-label Graph Node Classification
    Gao, Kaisheng
    Zhang, Jing
    Zhou, Cangqi
    [J]. WEB INFORMATION SYSTEMS ENGINEERING - WISE 2019, 2019, 11881 : 555 - 567
  • [36] Gene function prediction based on combining gene ontology hierarchy with multi-instance multi-label learning
    Li, Zejun
    Liao, Bo
    Li, Yun
    Liu, Wenhua
    Chen, Min
    Cai, Lijun
    [J]. RSC ADVANCES, 2018, 8 (50) : 28503 - 28509
  • [37] Identifying Interdisciplinary Sci-Tech Literature Based on Multi-Label Classification
    Wang W.
    Ning Z.
    Du Y.
    Zhou Y.
    [J]. Data Analysis and Knowledge Discovery, 2023, 7 (01) : 102 - 112
  • [38] A Systematic Literature Review on Multi-Label Classification based on Machine Learning Algorithms
    Endut, Nurshahira
    Hamzah, W. M. Amir Fazamin W.
    Ismail, Ismahafezi
    Yusof, Mohd Kamir
    Abu Baker, Yousef
    Yusoff, Hafiz
    [J]. TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2022, 11 (02): : 658 - 666
  • [39] A Gene Expression Programming Algorithm for Multi-Label Classification
    Avila, J. L.
    Gibaja, E. L.
    Zafra, A.
    Ventura, S.
    [J]. JOURNAL OF MULTIPLE-VALUED LOGIC AND SOFT COMPUTING, 2011, 17 (2-3) : 183 - 206
  • [40] Multi-label classification of gene function using MLPs
    Skabar, Andrew
    Wollersheim, Dennis
    Whitfort, Tim
    [J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 2234 - +