Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature

被引:0
|
作者
Vollmar, Melanie [1 ]
Tirunagari, Santosh [2 ]
Harrus, Deborah [1 ]
Armstrong, David [1 ]
Gaborova, Romana [3 ]
Gupta, Deepti [1 ]
Afonso, Marcelo Querino Lima [1 ]
Evans, Genevieve [1 ]
Velankar, Sameer [1 ]
机构
[1] European Mol Biol Lab European Bioinformat Inst EM, Prot Data Bank Europe, Wellcome Genome Campus, Cambridge CB10 1SD, England
[2] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Genome Campus, Cambridge CB10 1SD, England
[3] Masaryk Univ, CEITEC Cent European Inst Technol, Kamenice 5, Brno 62500, Czech Republic
关键词
MECHANISM; COMPLEX; ONTOLOGY; DOMAIN;
D O I
10.1038/s41597-024-03841-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] The need for command and control instant message adaptive interfaces: Lessons learned from Tactical Tomahawk human-in-the-loop simulations
    Cummings, ML
    CYBERPSYCHOLOGY & BEHAVIOR, 2004, 7 (06): : 653 - 661
  • [42] A new Approach to Revealing Functional Residues from Analysis of Protein Primary Structure
    Vojisavljevic, Vuk
    Pirogova, Elena
    Davidovic, Dragomir
    Cosic, Irena
    2009 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-20, 2009, : 4731 - +
  • [43] An integrative computational approach to identify disease-specific networks from PubMed literature information
    Zhang, Yuji
    Li, Dingchen
    Tao, Cui
    Shen, Feichen
    Liu, Hongfang
    2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [44] A Novel Approach to Identify Protein Coding Domains by Sampling Binary Profiles from Genome
    Song, Tao
    Wang, Xun
    Su, Yansen
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2014, 11 (01) : 147 - 152
  • [45] Evolutionary approach to predicting the binding site residues of a protein from its primary sequence
    Tseng, Yan Yuan
    Li, Wen-Hsiung
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2011, 108 (13) : 5313 - 5318
  • [46] A minimalist approach toward protein recognition by epitope transfer from functionally evolved β-sheet surfaces
    Rajagopal, Srivats
    Meyer, Scott C.
    Goldman, Aaron
    Zhou, Min
    Ghosh, Indraneel
    JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 2006, 128 (44) : 14356 - 14363
  • [47] Identification of residues important for cleavage of the extracellular signaling peptide CSF of Bacillus subtilis from its precursor protein
    Lanigan-Gerdes, Sara
    Briceno, Geraldine
    Dooley, Alek N.
    Faull, Kym F.
    Lazazzera, Beth A.
    JOURNAL OF BACTERIOLOGY, 2008, 190 (20) : 6668 - 6675
  • [48] IDENTIFICATION OF STRUCTURALLY AND FUNCTIONALLY IMPORTANT HISTIDINE-RESIDUES IN CYTOPLASMIC ASPARTYL-TRANSFER RNA-SYNTHETASE FROM SACCHAROMYCES-CEREVISIAE
    GASPARINI, S
    VINCENDON, P
    ERIANI, G
    GANGLOFF, J
    BOULANGER, Y
    REINBOLT, J
    KERN, D
    BIOCHEMISTRY, 1991, 30 (17) : 4284 - 4289
  • [49] Why consider the human-in-the-loop in automated cyber-physical production systems? Two cases from cross-company cooperation
    Brauner, Philipp
    Ziefle, Martina
    2019 IEEE 17TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2019, : 861 - 866
  • [50] A human-in-the-loop ensemble fusion framework for road crash prediction: coping with imbalanced heterogeneous data from the driver-vehicle-environment system
    Abou Elassad, Dauha Elamrani
    Abou Elassad, Zouhair Elamrani
    Ed-Dahbi, Abdel Majid
    El Meslouhi, Othmane
    Kardouchi, Mustapha
    Akhloufi, Moulay
    Jahan, Nusrat
    TRANSPORTATION LETTERS-THE INTERNATIONAL JOURNAL OF TRANSPORTATION RESEARCH, 2024,