Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow

被引:8
|
作者
Thakur, Payal [1 ,2 ]
Alaba, Mathew O. [3 ]
Rauniyar, Shailabh [1 ,4 ]
Singh, Ram Nageena [1 ,4 ]
Saxena, Priya [1 ,2 ]
Bomgni, Alain [3 ]
Gnimpieba, Etienne Z. [2 ,3 ,4 ]
Lushbough, Carol [3 ]
Goh, Kian Mau [5 ]
Sani, Rajesh Kumar [1 ,2 ,4 ,6 ,7 ]
机构
[1] South Dakota Sch Mines & Technol, Dept Chem & Biol Engn, Rapid City, SD 57701 USA
[2] South Dakota Sch Mines & Technol, Data Driven Mat Discovery Ctr Bioengn Innovat, Rapid City, SD 57701 USA
[3] Univ South Dakota, Dept Biomed Engn, Sioux Falls, SD 57069 USA
[4] South Dakota Sch Mines & Technol, Dimens Mat Biofilm Engn Sci & Technol 2, Rapid City, SD 57701 USA
[5] Univ Teknol Malaysia, Fac Sci, Skudai 81310, Johor, Malaysia
[6] South Dakota Sch Mines & Technol, BuG ReMeDEE Consortium, Rapid City, SD 57701 USA
[7] Composite & Nanocomposite Adv Mfg Ctr Biomat, Rapid City, SD 57701 USA
基金
美国国家科学基金会;
关键词
biocorrosion; sulfate-reducing bacteria; text mining; metal ion; sulfur metabolism; DESULFOVIBRIO-VULGARIS HILDENBOROUGH; MICROBIALLY INFLUENCED CORROSION; ELECTRON-TRANSFER; OXIDATIVE STRESS; OUTER-MEMBRANE; IRON; HYDROGENASES; BIOFILM; SYSTEMS; STEEL;
D O I
10.3390/microorganisms11010119
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
A significant amount of literature is available on biocorrosion, which makes manual extraction of crucial information such as genes and proteins a laborious task. Despite the fast growth of biology related corrosion studies, there is a limited number of gene collections relating to the corrosion process (biocorrosion). Text mining offers a potential solution by automatically extracting the essential information from unstructured text. We present a text mining workflow that extracts biocorrosion associated genes/proteins in sulfate-reducing bacteria (SRB) from literature databases (e.g., PubMed and PMC). This semi-automatic workflow is built with the Named Entity Recognition (NER) method and Convolutional Neural Network (CNN) model. With PubMed and PMCID as inputs, the workflow identified 227 genes belonging to several Desulfovibrio species. To validate their functions, Gene Ontology (GO) enrichment and biological network analysis was performed using UniprotKB and STRING-DB, respectively. The GO analysis showed that metal ion binding, sulfur binding, and electron transport were among the principal molecular functions. Furthermore, the biological network analysis generated three interlinked clusters containing genes involved in metal ion binding, cellular respiration, and electron transfer, which suggests the involvement of the extracted gene set in biocorrosion. Finally, the dataset was validated through manual curation, yielding a similar set of genes as our workflow; among these, hysB and hydA, and sat and dsrB were identified as the metal ion binding and sulfur metabolism genes, respectively. The identified genes were mapped with the pangenome of 63 SRB genomes that yielded the distribution of these genes across 63 SRB based on the amino acid sequence similarity and were further categorized as core and accessory gene families. SRB's role in biocorrosion involves the transfer of electrons from the metal surface via a hydrogen medium to the sulfate reduction pathway. Therefore, genes encoding hydrogenases and cytochromes might be participating in removing hydrogen from the metals through electron transfer. Moreover, the production of corrosive sulfide from the sulfur metabolism indirectly contributes to the localized pitting of the metals. After the corroboration of text mining results with SRB biocorrosion mechanisms, we suggest that the text mining framework could be utilized for genes/proteins extraction and significantly reduce the manual curation time.
引用
收藏
页数:18
相关论文
共 8 条
  • [1] Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
    Pham, Ba
    Jovanovic, Jelena
    Bagheri, Ebrahim
    Antony, Jesmin
    Ashoor, Huda
    Nguyen, Tam T.
    Rios, Patricia
    Robson, Reid
    Thomas, Sonia M.
    Watt, Jennifer
    Straus, Sharon E.
    Tricco, Andrea C.
    SYSTEMATIC REVIEWS, 2021, 10 (01)
  • [2] Text mining to support abstract screening for knowledge syntheses: a semi-automated workflow
    Ba’ Pham
    Jelena Jovanovic
    Ebrahim Bagheri
    Jesmin Antony
    Huda Ashoor
    Tam T. Nguyen
    Patricia Rios
    Reid Robson
    Sonia M. Thomas
    Jennifer Watt
    Sharon E. Straus
    Andrea C. Tricco
    Systematic Reviews, 10
  • [3] Integration of text mining and biological network analysis: Identification of essential genes in sulfate-reducing bacteria
    Saxena, Priya
    Rauniyar, Shailabh
    Thakur, Payal
    Singh, Ram Nageena
    Bomgni, Alain
    Alaba, Mathew O.
    Tripathi, Abhilash Kumar
    Gnimpieba, Etienne Z.
    Lushbough, Carol
    Sani, Rajesh Kumar
    FRONTIERS IN MICROBIOLOGY, 2023, 14
  • [4] Gene Sets and Mechanisms of Sulfate-Reducing Bacteria Biofilm Formation and Quorum Sensing With Impact on Corrosion
    Tripathi, Abhilash Kumar
    Thakur, Payal
    Saxena, Priya
    Rauniyar, Shailabh
    Gopalakrishnan, Vinoj
    Singh, Ram Nageena
    Gadhamshetty, Venkataramana
    Gnimpieba, Etienne Z.
    Jasthi, Bharat K.
    Sani, Rajesh Kumar
    FRONTIERS IN MICROBIOLOGY, 2021, 12
  • [5] Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation
    Van Auken, Kimberly
    Jaffery, Joshua
    Chan, Juancarlos
    Mueller, Hans-Michael
    Sternberg, Paul W.
    BMC BIOINFORMATICS, 2009, 10 : 228
  • [6] Semi-automated curation of protein subcellular localization: a text mining-based approach to Gene Ontology (GO) Cellular Component curation
    Kimberly Van Auken
    Joshua Jaffery
    Juancarlos Chan
    Hans-Michael Müller
    Paul W Sternberg
    BMC Bioinformatics, 10
  • [7] MORPHOLOGY OF THE SURFACE-COAT AND EXTRACELLULAR-MATRIX OF SULPHIDOGENIC BIOFILMS ENRICHED IN SULFATE-REDUCING BACTERIA INVOLVED IN BIOCORROSION PROCESSES IN THE OFFSHORE OIL EXTRACTION INDUSTRY OFF BRAZIL COAST
    COUTINHO, CMLM
    MAGALHAES, FC
    ARAUJOJORGE, TC
    JOURNAL OF GENERAL AND APPLIED MICROBIOLOGY, 1994, 40 (03): : 271 - 276
  • [8] Combined Genomic and Proteomic Approaches Identify Gene Clusters Involved in Anaerobic 2-Methylnaphthalene Degradation in the Sulfate-Reducing Enrichment Culture N47
    Selesi, Drazenka
    Jehmlich, Nico
    von Bergen, Martin
    Schmidt, Frank
    Rattei, Thomas
    Tischler, Patrick
    Lueders, Tillmann
    Meckenstock, Rainer U.
    JOURNAL OF BACTERIOLOGY, 2010, 192 (01) : 295 - 306