Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature

被引:0
|
作者
Vollmar, Melanie [1 ]
Tirunagari, Santosh [2 ]
Harrus, Deborah [1 ]
Armstrong, David [1 ]
Gaborova, Romana [3 ]
Gupta, Deepti [1 ]
Afonso, Marcelo Querino Lima [1 ]
Evans, Genevieve [1 ]
Velankar, Sameer [1 ]
机构
[1] European Mol Biol Lab European Bioinformat Inst EM, Prot Data Bank Europe, Wellcome Genome Campus, Cambridge CB10 1SD, England
[2] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Genome Campus, Cambridge CB10 1SD, England
[3] Masaryk Univ, CEITEC Cent European Inst Technol, Kamenice 5, Brno 62500, Czech Republic
关键词
MECHANISM; COMPLEX; ONTOLOGY; DOMAIN;
D O I
10.1038/s41597-024-03841-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Residue Interaction Network: An Approach to Identify Functionally Important Residues for Protein:Protein Interactions
    Mehta, Isha D.
    Beck, Brian W.
    BIOPHYSICAL JOURNAL, 2013, 104 (02) : 390A - 390A
  • [2] From Human-in-the-Loop to Human-in-Power
    Zheng, Elise Li
    Jin, Weina
    Hamarneh, Ghassan
    Lee, Sandra Soo-Jin
    AMERICAN JOURNAL OF BIOETHICS, 2024, 24 (09): : 84 - 86
  • [3] Predicting functionally important residues from sequence conservation
    Capra, John A.
    Singh, Mona
    BIOINFORMATICS, 2007, 23 (15) : 1875 - 1882
  • [4] FUNCTIONALLY IMPORTANT RESIDUES AT A SUBUNIT INTERFACE SITE IN THE RECA PROTEIN FROM ESCHERICHIA-COLI
    SKIBA, MC
    KNIGHT, KL
    JOURNAL OF BIOLOGICAL CHEMISTRY, 1994, 269 (05) : 3823 - 3828
  • [5] Functionally Important Residues from Mode Coupling during Short-Time Protein Dynamics
    Varol, Onur
    Yuret, Deniz
    Erman, Burak
    Kabakcioglu, Alkan
    BIOPHYSICAL JOURNAL, 2015, 108 (02) : 377A - 377A
  • [6] A 'Human-in-the-Loop' approach for Information Extraction from Privacy Policies under Data Scarcity
    Gebauer, Michael
    Maschhur, Faraz
    Leschke, Nicola
    Gruelnewald, Elias
    Pallas, Frank
    2023 IEEE EUROPEAN SYMPOSIUM ON SECURITY AND PRIVACY WORKSHOPS, EUROS&PW, 2023, : 76 - 83
  • [7] FireDB - a database of functionally important residues from proteins of known structure
    Lopez, Gonzalo
    Valencia, A.
    Tress, M.
    NUCLEIC ACIDS RESEARCH, 2007, 35 : D219 - D223
  • [8] Interactive Human-in-the-loop Coordination of Manipulation Skills Learned from Demonstration
    Guo, Meng
    Buerger, Mathias
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 7292 - 7298
  • [9] From Detection to Action: a Human-in-the-loop Toolkit for Anomaly Reasoning and Management
    Ding, Xueying
    Seleznev, Nikita
    Kumar, Senthil
    Bruss, C. Bayan
    Akoglu, Leman
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023, 2023, : 279 - 287
  • [10] VRFromX: From Scanned Reality to Interactive Virtual Experience with Human-in-the-Loop
    Ipsita, Ananya
    Li, Hao
    Duan, Runlin
    Cao, Yuanzhi
    Chidambaram, Subramanian
    Liu, Min
    Ramani, Karthik
    EXTENDED ABSTRACTS OF THE 2021 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'21), 2021,