Dataset from a human-in-the-loop approach to identify functionally important protein residues from literature

被引:0
|
作者
Vollmar, Melanie [1 ]
Tirunagari, Santosh [2 ]
Harrus, Deborah [1 ]
Armstrong, David [1 ]
Gaborova, Romana [3 ]
Gupta, Deepti [1 ]
Afonso, Marcelo Querino Lima [1 ]
Evans, Genevieve [1 ]
Velankar, Sameer [1 ]
机构
[1] European Mol Biol Lab European Bioinformat Inst EM, Prot Data Bank Europe, Wellcome Genome Campus, Cambridge CB10 1SD, England
[2] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Genome Campus, Cambridge CB10 1SD, England
[3] Masaryk Univ, CEITEC Cent European Inst Technol, Kamenice 5, Brno 62500, Czech Republic
关键词
MECHANISM; COMPLEX; ONTOLOGY; DOMAIN;
D O I
10.1038/s41597-024-03841-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
We present a novel system that leverages curators in the loop to develop a dataset and model for detecting structure features and functional annotations at residue-level from standard publication text. Our approach involves the integration of data from multiple resources, including PDBe, EuropePMC, PubMedCentral, and PubMed, combined with annotation guidelines from UniProt, and LitSuggest and HuggingFace models as tools in the annotation process. A team of seven annotators manually curated ten articles for named entities, which we utilized to train a starting PubmedBert model from HuggingFace. Using a human-in-the-loop annotation system, we iteratively developed the best model with commendable performance metrics of 0.90 for precision, 0.92 for recall, and 0.91 for F1-measure. Our proposed system showcases a successful synergy of machine learning techniques and human expertise in curating a dataset for residue-level functional annotations and protein structure features. The results demonstrate the potential for broader applications in protein research, bridging the gap between advanced machine learning models and the indispensable insights of domain experts.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] FUNCTIONALLY IMPORTANT AMINO-ACID-RESIDUES IN CHYMOTRYPSIN INHIBITOR FROM GLEDITSIA-TRIACANTHOS L
    VALUEVA, TA
    KLADNITSKAYA, GV
    MOSOLOV, VV
    BIOCHEMISTRY-MOSCOW, 1986, 51 (10) : 1428 - 1434
  • [22] Functionally important residues for the anticoagulant activity of a basic phospholipase A2 from the Agkistrodon halys Pallas
    Zhong, XY
    Jiao, HM
    Fan, L
    Wu, XF
    Zhou, YC
    PROTEIN AND PEPTIDE LETTERS, 2002, 9 (05): : 427 - 434
  • [23] Large-scale relation extraction from web documents and knowledge graphs with human-in-the-loop
    Ristoski, Petar
    Gentile, Anna Lisa
    Alba, Alfredo
    Gruhl, Daniel
    Welch, Steven
    JOURNAL OF WEB SEMANTICS, 2020, 60 (60):
  • [24] A Human-in-the-Loop Architecture for Mobile Network: From the View of Large Scale Mobile Data Traffic
    Yuanyuan Qiao
    Jianyang Yu
    Wenhui Lin
    Jie Yang
    Wireless Personal Communications, 2018, 102 : 2233 - 2259
  • [25] Virtual Environments Integrative Design - From Human-in-the-Loop to Bio-Cyber-Physical Systems
    Fass, Didier
    Gechter, Franck
    ADVANCES IN HUMAN FACTORS AND SYSTEMS INTERACTION, 2018, 592 : 168 - 176
  • [26] A Human-in-the-Loop Architecture for Mobile Network: From the View of Large Scale Mobile Data Traffic
    Qiao, Yuanyuan
    Yu, Jianyang
    Lin, Wenhui
    Yang, Jie
    WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (03) : 2233 - 2259
  • [27] Computational prediction and experimental validation identify functionally conserved lncRNAs from zebrafish to human
    Huang, Wenze
    Xiong, Tuanlin
    Zhao, Yuting
    Heng, Jian
    Han, Ge
    Wang, Pengfei
    Zhao, Zhihua
    Shi, Ming
    Li, Juan
    Wang, Jiazhen
    Wu, Yixia
    Liu, Feng
    Xi, Jianzhong Jeff
    Wang, Yangming
    Zhang, Qiangfeng Cliff
    NATURE GENETICS, 2024, 56 (01) : 124 - 135
  • [28] Computational prediction and experimental validation identify functionally conserved lncRNAs from zebrafish to human
    Wenze Huang
    Tuanlin Xiong
    Yuting Zhao
    Jian Heng
    Ge Han
    Pengfei Wang
    Zhihua Zhao
    Ming Shi
    Juan Li
    Jiazhen Wang
    Yixia Wu
    Feng Liu
    Jianzhong Jeff Xi
    Yangming Wang
    Qiangfeng Cliff Zhang
    Nature Genetics, 2024, 56 : 124 - 135
  • [29] Human-in-the-loop Control of a Humanoid Robot for Disaster Response: A Report from the DARPA Robotics Challenge Trials
    DeDonato, Mathew
    Dimitrov, Velin
    Du, Ruixiang
    Giovacchini, Ryan
    Knoedler, Kevin
    Long, Xianchao
    Polido, Felipe
    Gennert, Michael A.
    Padir, Taskin
    Feng, Siyuan
    Moriguchi, Hirotaka
    Whitman, Eric
    Xinjilefu, X.
    Atkeson, Christopher G.
    JOURNAL OF FIELD ROBOTICS, 2015, 32 (02) : 275 - 292
  • [30] Structural Material Condition Assessment through Human-in-the-Loop Incremental Semisupervised Learning from Hyperspectral Images
    Chen, ZhiQiang
    Tang, Shimin
    JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2024, 38 (06)