An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

被引:11
|
作者
Bell, Michael J. [1 ]
Gillespie, Colin S. [2 ]
Swan, Daniel [3 ]
Lord, Phillip [1 ]
机构
[1] Newcastle Univ, Sch Comp Sci, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[2] Newcastle Univ, Sch Math & Stat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[3] Newcastle Univ, Bioinformat Support Unit, ICAMB, Sch Med, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
关键词
GENE ONTOLOGY; SUPPLEMENT TREMBL; PROTEIN FUNCTION; SEQUENCE; DATABASE;
D O I
10.1093/bioinformatics/bts372
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. Results: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality.
引用
收藏
页码:I562 / I568
页数:7
相关论文
共 50 条
  • [31] Analysing the Security Risks of Cloud Adoption Using the SeCA Model: A Case Study
    Baars, Thijs
    Spruit, Marco
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2012, 18 (12) : 1662 - 1678
  • [32] A Mixed Method Approach to Quality of Life Research: A Case Study Approach
    Heather Dunning
    Allison Williams
    Sylvia Abonyi
    Valorie Crooks
    Social Indicators Research, 2008, 85 : 145 - 158
  • [33] A mixed method approach to quality of life research: A case study approach
    Dunning, Heather
    Williams, Allison
    Abonyi, Sylvia
    Crooks, Valorie
    SOCIAL INDICATORS RESEARCH, 2008, 85 (01) : 145 - 158
  • [34] Establishing a standard method for analysing case detection delay in leprosy using a Bayesian modelling approach
    Hambridge, Thomas
    Coffeng, Luc E.
    de Vlas, Sake J.
    Richardus, Jan Hendrik
    INFECTIOUS DISEASES OF POVERTY, 2023, 12 (01)
  • [35] Establishing a standard method for analysing case detection delay in leprosy using a Bayesian modelling approach
    Hambridge Thomas
    Coffeng Luc E
    de Vlas Sake J
    Richardus Jan Hendrik
    贫困所致传染病(英文), 2023, 12 (01)
  • [36] Establishing a standard method for analysing case detection delay in leprosy using a Bayesian modelling approach
    Thomas Hambridge
    Luc E. Coffeng
    Sake J. de Vlas
    Jan Hendrik Richardus
    Infectious Diseases of Poverty, 12
  • [37] Analysing the dynamics of technological convergence using a co-classification approach: a case of healthcare services
    Yun, Junghwan
    Geum, Youngjung
    TECHNOLOGY ANALYSIS & STRATEGIC MANAGEMENT, 2019, 31 (12) : 1412 - 1429
  • [38] TOTAL QUALITY MANAGEMENT - AN APPROACH AND A CASE-STUDY
    ALY, NA
    MAYTUBBY, VJ
    ELSHENNAWY, AK
    COMPUTERS & INDUSTRIAL ENGINEERING, 1990, 19 (1-4) : 111 - 116
  • [39] A Study on an Approach for Analysing Test Basis Using I/O Test Data Patterns
    Yumoto, Tsuyoshi
    Matsuodani, Tohru
    Tsuda, Kazuhiko
    2015 IEEE EIGHTH INTERNATIONAL CONFERENCE ON SOFTWARE TESTING, VERIFICATION AND VALIDATION WORKSHOPS (ICSTW), 2015,
  • [40] Quality improvement of apartment projects using fuzzy-QFD approach: A case study in Vietnam
    Van Truong Luu
    Kim, Soo-Yong
    Trinh-Quan Truong
    Ogunlana, Stephen O.
    KSCE JOURNAL OF CIVIL ENGINEERING, 2009, 13 (05) : 305 - 315