Multi-label literature classification based on the Gene Ontology graph

被引:17
|
作者
Jin, Bo [1 ]
Muller, Brian [1 ]
Zhai, Chengxiang [2 ]
Lu, Xinghua [1 ]
机构
[1] Med Univ S Carolina, Dept Biostat Bioinformat & Epidemiol, Charleston, SC 29425 USA
[2] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
关键词
D O I
10.1186/1471-2105-9-525
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The Gene Ontology is a controlled vocabulary for representing knowledge related to genes and proteins in a computable form. The current effort of manually annotating proteins with the Gene Ontology is outpaced by the rate of accumulation of biomedical knowledge in literature, which urges the development of text mining approaches to facilitate the process by automatically extracting the Gene Ontology annotation from literature. The task is usually cast as a text classification problem, and contemporary methods are confronted with unbalanced training data and the difficulties associated with multi-label classification. Results: In this research, we investigated the methods of enhancing automatic multi-label classification of biomedical literature by utilizing the structure of the Gene Ontology graph. We have studied three graph-based multi-label classification algorithms, including a novel stochastic algorithm and two top-down hierarchical classification methods for multi-label literature classification. We systematically evaluated and compared these graph-based classification algorithms to a conventional flat multi-label algorithm. The results indicate that, through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods can significantly improve predictions of the Gene Ontology terms implied by the analyzed text. Furthermore, the graph-based multi-label classifiers are capable of suggesting Gene Ontology annotations (to curators) that are closely related to the true annotations even if they fail to predict the true ones directly. A software package implementing the studied algorithms is available for the research community. Conclusion: Through utilizing the information from the structure of the Gene Ontology graph, the graph-based multi-label classification methods have better potential than the conventional flat multi-label classification approach to facilitate protein annotation based on the literature.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Multi-label literature classification based on the Gene Ontology graph
    Bo Jin
    Brian Muller
    Chengxiang Zhai
    Xinghua Lu
    [J]. BMC Bioinformatics, 9
  • [2] Ontology based Classification for Multi-label Image Annotation
    Reshma, Ismat Ara
    Ullah, Md Zia
    Aono, Masaki
    [J]. 2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 226 - 231
  • [3] Multi-Label Classification using an Ontology
    Traore, Yaya
    Bassole, Didier
    Malo, Sadouanouan
    Sere, Abdoulaye
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (12) : 472 - 476
  • [4] Multi-Label Classification with Label Graph Superimposing
    Wang, Ya
    He, Dongliang
    Li, Fu
    Long, Xiang
    Zhou, Zhichao
    Ma, Jinwen
    Wen, Shilei
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12265 - 12272
  • [5] Comparison of Protein Descriptors Used in Hierarchical Multi-label Classification Based on Gene Ontology
    Pavlovikj, Natasha
    Ivanoska, Ilinka
    Kalajdziski, Slobodan
    [J]. ICT INNOVATIONS 2011, 2011, 150 : 61 - 71
  • [6] Ontology-based multi-label classification of economic articles
    Vogrincic, Sergeja
    Bosnic, Zoran
    [J]. COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2011, 8 (01) : 101 - 119
  • [7] Knowledge Graph Constraints for Multi-label Graph Classification
    Ringsquandl, Martin
    Lamparter, Steffen
    Thon, Ingo
    Lepratti, Raffaello
    Kroeger, Peer
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 121 - 127
  • [8] PROXIMITY-BASED GRAPH EMBEDDINGS FOR MULTI-LABEL CLASSIFICATION
    Mu, Tingting
    Ananiadou, Sophia
    [J]. KDIR 2010: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND INFORMATION RETRIEVAL, 2010, : 74 - 84
  • [9] Label Correlation Based Graph Convolutional Network for Multi-label Text Classification
    Huy-The Vu
    Minh-Tien Nguyen
    Van-Chien Nguyen
    Manh-Tran Tien
    Van-Hau Nguyen
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] A multi-label classification approach based on ontology and structure weight strategy
    [J]. Yang, F. (yangfq147@nenu.edu.cn), 1600, ICIC Express Letters Office, Tokai University, Kumamoto Campus, 9-1-1, Toroku, Kumamoto, 862-8652, Japan (07):