Identifying protein subcellular localisation in scientific literature using bidirectional deep recurrent neural network

被引:2
|
作者
David, Rakesh [1 ]
Menezes, Rhys-Joshua D. [2 ]
De Klerk, Jan [2 ]
Castleden, Ian R. [3 ]
Hooper, Cornelia M. [3 ]
Carneiro, Gustavo [2 ]
Gilliham, Matthew [1 ]
机构
[1] Univ Adelaide, Sch Agr Food & Wine, ARC Ctr Excellence Plant Energy Biol, Waite Res Inst, Waite Campus, Adelaide, SA, Australia
[2] Univ Adelaide, Australian Inst Machine Learning, Sch Comp Sci, Adelaide, SA, Australia
[3] Univ Western Australia, ARC Ctr Excellence Plant Energy Biol, Perth, WA, Australia
基金
澳大利亚研究理事会;
关键词
EXTRACTION; BIOLOGY; ENTITY; MODEL;
D O I
10.1038/s41598-020-80441-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The increased diversity and scale of published biological data has to led to a growing appreciation for the applications of machine learning and statistical methodologies to gain new insights. Key to achieving this aim is solving the Relationship Extraction problem which specifies the semantic interaction between two or more biological entities in a published study. Here, we employed two deep neural network natural language processing (NLP) methods, namely: the continuous bag of words (CBOW), and the bi-directional long short-term memory (bi-LSTM). These methods were employed to predict relations between entities that describe protein subcellular localisation in plants. We applied our system to 1700 published Arabidopsis protein subcellular studies from the SUBA manually curated dataset. The system combines pre-processing of full-text articles in a machine-readable format with relevant sentence extraction for downstream NLP analysis. Using the SUBA corpus, the neural network classifier predicted interactions between protein name, subcellular localisation and experimental methodology with an average precision, recall rate, accuracy and F1 scores of 95.1%, 82.8%, 89.3% and 88.4% respectively (n=30). Comparable scoring metrics were obtained using the CropPAL database as an independent testing dataset that stores protein subcellular localisation in crop species, demonstrating wide applicability of prediction model. We provide a framework for extracting protein functional features from unstructured text in the literature with high accuracy, improving data dissemination and unlocking the potential of big data text analytics for generating new hypotheses.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Network intrusion detection using fusion features and convolutional bidirectional recurrent neural network
    Jagruthi, H.
    Kavitha, C.
    Mulimani, Manjunath
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2022, 69 (01) : 93 - 100
  • [22] Deep residual-dense network based on bidirectional recurrent neural network for atrial fibrillation detection
    Laghari, Asif Ali
    Sun, Yanqiu
    Alhussein, Musaed
    Aurangzeb, Khursheed
    Anwar, Muhammad Shahid
    Rashid, Mamoon
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [23] Deep residual-dense network based on bidirectional recurrent neural network for atrial fibrillation detection
    Asif Ali Laghari
    Yanqiu Sun
    Musaed Alhussein
    Khursheed Aurangzeb
    Muhammad Shahid Anwar
    Mamoon Rashid
    Scientific Reports, 13
  • [24] Meter classification of Arabic poems using deep bidirectional recurrent neural networks
    Al-Shaibani, Maged S.
    Alyafeai, Zaid
    Ahmad, Irfan
    PATTERN RECOGNITION LETTERS, 2020, 136 : 1 - 7
  • [25] Single-cell subcellular protein localisation using novel ensembles of diverse deep architectures
    Husain, Syed Sameed
    Ong, Eng-Jon
    Minskiy, Dmitry
    Bober-Irizar, Mikel
    Irizar, Amaia
    Bober, Miroslaw
    COMMUNICATIONS BIOLOGY, 2023, 6 (01)
  • [26] Single-cell subcellular protein localisation using novel ensembles of diverse deep architectures
    Syed Sameed Husain
    Eng-Jon Ong
    Dmitry Minskiy
    Mikel Bober-Irizar
    Amaia Irizar
    Miroslaw Bober
    Communications Biology, 6
  • [27] The protein-protein interaction network alignment using recurrent neural network
    Mahdipour, Elham
    Ghasemzadeh, Mohammad
    MEDICAL & BIOLOGICAL ENGINEERING & COMPUTING, 2021, 59 (11-12) : 2263 - 2286
  • [28] The protein-protein interaction network alignment using recurrent neural network
    Elham Mahdipour
    Mohammad Ghasemzadeh
    Medical & Biological Engineering & Computing, 2021, 59 : 2263 - 2286
  • [29] Identifying Flux Rope Signatures Using a Deep Neural Network
    Luiz F. G. dos Santos
    Ayris Narock
    Teresa Nieves-Chinchilla
    Marlon Nuñez
    Michael Kirk
    Solar Physics, 2020, 295
  • [30] Identifying Flux Rope Signatures Using a Deep Neural Network
    dos Santos, Luiz F. G.
    Narock, Ayris
    Nieves-Chinchilla, Teresa
    Nunez, Marlon
    Kirk, Michael
    SOLAR PHYSICS, 2020, 295 (10)