An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

被引:11
|
作者
Bell, Michael J. [1 ]
Gillespie, Colin S. [2 ]
Swan, Daniel [3 ]
Lord, Phillip [1 ]
机构
[1] Newcastle Univ, Sch Comp Sci, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[2] Newcastle Univ, Sch Math & Stat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[3] Newcastle Univ, Bioinformat Support Unit, ICAMB, Sch Med, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
关键词
GENE ONTOLOGY; SUPPLEMENT TREMBL; PROTEIN FUNCTION; SEQUENCE; DATABASE;
D O I
10.1093/bioinformatics/bts372
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. Results: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality.
引用
收藏
页码:I562 / I568
页数:7
相关论文
共 50 条
  • [41] Improving software quality using Six Sigma DMAIC-based approach: a case study
    Karout, Racha
    Awasthi, Anjali
    BUSINESS PROCESS MANAGEMENT JOURNAL, 2017, 23 (04) : 842 - 856
  • [42] Evaluation of surface water quality using an ecotoxicological approach: a case study of the Alqueva Reservoir (Portugal)
    Patrícia Palma
    Paula Alvarenga
    Vera Palma
    Cláudia Matos
    Rosa Maria Fernandes
    Amadeu Soares
    Isabel Rita Barbosa
    Environmental Science and Pollution Research, 2010, 17 : 703 - 716
  • [43] Quality improvement of apartment projects using fuzzy-QFD approach: A case study in Vietnam
    Truong Van Luu
    Soo-Yong Kim
    Trinh-Quan Truong
    Stephen O. Ogunlana
    KSCE Journal of Civil Engineering, 2009, 13 : 305 - 315
  • [44] Evaluation of surface water quality using an ecotoxicological approach: a case study of the Alqueva Reservoir (Portugal)
    Palma, Patricia
    Alvarenga, Paula
    Palma, Vera
    Matos, Claudia
    Fernandes, Rosa Maria
    Soares, Amadeu
    Barbosa, Isabel Rita
    ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2010, 17 (03) : 703 - 716
  • [45] A Case Study on Specifying Quality Requirements Using a Quality Model
    Lochmann, K.
    Fernandez, D. Mendez
    Wagner, S.
    2012 19TH ASIA-PACIFIC SOFTWARE ENGINEERING CONFERENCE (APSEC), VOL 1, 2012, : 577 - 582
  • [46] The elaboration of indices to assess biological water quality.: A case study
    Graça, MAS
    Coimbra, CN
    WATER RESEARCH, 1998, 32 (02) : 380 - 392
  • [47] Process reengineering by using the 4PL approach A case study on transportation processing in the agricultural bulk logistics sector
    Mehmann, Jens
    Teuteberg, Frank
    BUSINESS PROCESS MANAGEMENT JOURNAL, 2016, 22 (04) : 879 - 902
  • [48] A general framework for analysing multiplayer games in networks using territorial interactions as a case study
    Broom, Mark
    Rychtar, Jan
    JOURNAL OF THEORETICAL BIOLOGY, 2012, 302 : 70 - 80
  • [49] Analysing the Impact of Universities on Regional Development: A Case Study Using Fuzzy Cognitive Maps
    Bulut, K.
    Kayakutlu, G.
    WORLD CONGRESS ON ENGINEERING, WCE 2011, VOL II, 2011, : 1089 - 1093
  • [50] Analysing the Conceptions on Modelling of Engineering Undergraduate Students: A Case Study Using Cluster Analysis
    Fazio, Claudio
    Battaglia, Onofrio Rosario
    Di Paola, Benedetto
    Adorno, Dominique Persano
    KEY COMPETENCES IN PHYSICS TEACHING AND LEARNING, 2017, 190 : 79 - 94