Identifying protein subcellular localisation in scientific literature using bidirectional deep recurrent neural network

被引:2
|
作者
David, Rakesh [1 ]
Menezes, Rhys-Joshua D. [2 ]
De Klerk, Jan [2 ]
Castleden, Ian R. [3 ]
Hooper, Cornelia M. [3 ]
Carneiro, Gustavo [2 ]
Gilliham, Matthew [1 ]
机构
[1] Univ Adelaide, Sch Agr Food & Wine, ARC Ctr Excellence Plant Energy Biol, Waite Res Inst, Waite Campus, Adelaide, SA, Australia
[2] Univ Adelaide, Australian Inst Machine Learning, Sch Comp Sci, Adelaide, SA, Australia
[3] Univ Western Australia, ARC Ctr Excellence Plant Energy Biol, Perth, WA, Australia
基金
澳大利亚研究理事会;
关键词
EXTRACTION; BIOLOGY; ENTITY; MODEL;
D O I
10.1038/s41598-020-80441-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The increased diversity and scale of published biological data has to led to a growing appreciation for the applications of machine learning and statistical methodologies to gain new insights. Key to achieving this aim is solving the Relationship Extraction problem which specifies the semantic interaction between two or more biological entities in a published study. Here, we employed two deep neural network natural language processing (NLP) methods, namely: the continuous bag of words (CBOW), and the bi-directional long short-term memory (bi-LSTM). These methods were employed to predict relations between entities that describe protein subcellular localisation in plants. We applied our system to 1700 published Arabidopsis protein subcellular studies from the SUBA manually curated dataset. The system combines pre-processing of full-text articles in a machine-readable format with relevant sentence extraction for downstream NLP analysis. Using the SUBA corpus, the neural network classifier predicted interactions between protein name, subcellular localisation and experimental methodology with an average precision, recall rate, accuracy and F1 scores of 95.1%, 82.8%, 89.3% and 88.4% respectively (n=30). Comparable scoring metrics were obtained using the CropPAL database as an independent testing dataset that stores protein subcellular localisation in crop species, demonstrating wide applicability of prediction model. We provide a framework for extracting protein functional features from unstructured text in the literature with high accuracy, improving data dissemination and unlocking the potential of big data text analytics for generating new hypotheses.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Identifying protein subcellular localisation in scientific literature using bidirectional deep recurrent neural network
    Rakesh David
    Rhys-Joshua D. Menezes
    Jan De Klerk
    Ian R. Castleden
    Cornelia M. Hooper
    Gustavo Carneiro
    Matthew Gilliham
    Scientific Reports, 11
  • [2] Localisation in Wireless Networks using Deep Bidirectional Recurrent Neural Networks
    Lynch, David
    Ho, Lester
    MacDonald, Michael
    O'Neill, Michael
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [3] Electricity Theft Detection Using Deep Bidirectional Recurrent Neural Network
    Chen, Zhongtao
    Meng, De
    Zhang, Yufan
    Xin, Tinglin
    Xiao, Ding
    2020 22ND INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): DIGITAL SECURITY GLOBAL AGENDA FOR SAFE SOCIETY!, 2020, : 401 - 406
  • [4] Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network
    Zhang, Buzhong
    Li, Linqing
    Lu, Qiang
    BIOMOLECULES, 2018, 8 (02)
  • [5] HumDLoc: Human Protein Subcellular Localization Prediction Using Deep Neural Network
    Semwal, Rahul
    Varadwaj, Pritish Kumar
    CURRENT GENOMICS, 2020, 21 (07) : 546 - 557
  • [6] A Deep Bidirectional LSTM Recurrent Neural Networks For Identifying Humans Indoors Using Channel State Information
    Nkabiti, Kabo Poloko
    Chen, Yueyun
    Sultan, Kashif
    Armand, Bika
    2019 28TH WIRELESS AND OPTICAL COMMUNICATIONS CONFERENCE (WOCC), 2019, : 266 - 270
  • [7] Online Proactive Caching in Mobile Edge Computing Using Bidirectional Deep Recurrent Neural Network
    Ale, Laha
    Zhang, Ning
    Wu, Huici
    Chen, Dajiang
    Han, Tao
    IEEE INTERNET OF THINGS JOURNAL, 2019, 6 (03) : 5520 - 5530
  • [8] A DEEP NEURAL NETWORK APPROACH FOR THE PREDICTION OF PROTEIN SUBCELLULAR LOCALIZATION
    Samson, A. B. P.
    Chandra, S. R. A.
    Manikant, M.
    NEURAL NETWORK WORLD, 2021, 31 (01) : 29 - 45
  • [9] Online Signature Verification Using Bidirectional Recurrent Neural Network
    Nathwani, Chirag
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND CONTROL SYSTEMS (ICICCS 2020), 2020, : 1076 - 1078
  • [10] Classification of Speaker Intoxication Using a Bidirectional Recurrent Neural Network
    Berninger, Kim
    Hoppe, Jannis
    Milde, Benjamin
    TEXT, SPEECH, AND DIALOGUE, 2016, 9924 : 435 - 442