An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

被引:11
|
作者
Bell, Michael J. [1 ]
Gillespie, Colin S. [2 ]
Swan, Daniel [3 ]
Lord, Phillip [1 ]
机构
[1] Newcastle Univ, Sch Comp Sci, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[2] Newcastle Univ, Sch Math & Stat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[3] Newcastle Univ, Bioinformat Support Unit, ICAMB, Sch Med, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
关键词
GENE ONTOLOGY; SUPPLEMENT TREMBL; PROTEIN FUNCTION; SEQUENCE; DATABASE;
D O I
10.1093/bioinformatics/bts372
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. Results: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality.
引用
收藏
页码:I562 / I568
页数:7
相关论文
共 50 条
  • [1] Can Inferred Provenance and Its Visualisation Be Used to Detect Erroneous Annotation? A Case Study Using UniProtKB
    Bell, Michael J.
    Collison, Matthew
    Lord, Phillip
    PLOS ONE, 2013, 8 (10):
  • [2] Describing a stalk bulk for analysing the effect of the arrangement of the stalks to the cutting quality
    Niemoeller, Bernd
    Harms, Hans-Heinrich
    Lang, Thorsten
    CONFERENCE: AGRICULTURAL ENGINEERING: LAND-TECHNIK 2010 - PARTNERSCHAFTEN FUR NEUR INNOVATIONSPOTENZIALE, 2010, : 425 - 430
  • [3] An approach for analysing transportation costs and a case study
    Sahin, Bahri
    Yilmaz, Huseyin
    Ust, Yasin
    Guneri, Ali Fuat
    Gulsun, Bahadir
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2009, 193 (01) : 1 - 11
  • [4] Analysing Dependability Case Arguments Using Quality Models
    Huhn, Michaela
    Zechner, Axel
    COMPUTER SAFETY, RELIABILITY, AND SECURITY, PROCEEDINGS, 2009, 5775 : 118 - 131
  • [5] A new method of describing damage in biological tissues using a structural approach
    Arnoux, PJ
    Pithioux, M
    Chabrand, P
    Jean, M
    Bonnoit, J
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING - 3, 2001, : 167 - 172
  • [6] Comparison of data annotation approaches using dependency tree annotation as a case study
    Zhou M.
    Gong C.
    Li Z.
    Zhang M.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2022, 62 (05): : 908 - 916
  • [7] A biological and chemical approach to restoring water quality: A case study in an urban eutrophic pond
    McKercher, Levi J.
    Messer, Tiffany L.
    Mittelstet, Aaron R.
    Comfort, Steve D.
    JOURNAL OF ENVIRONMENTAL MANAGEMENT, 2022, 318
  • [8] Analysing Worker Exposure to WBV at the Donana Biological Reserve (Spain). A Case Study
    Martinez-Aires, Maria D.
    Quiros-Priego, Joaquin
    Lopez-Alonso, Monica
    ADVANCES IN SAFETY MANAGEMENT AND HUMAN FACTORS (AHFE 2018), 2019, 791 : 252 - 263
  • [9] Chemical and biological characterization of paper: A case study using a proposed methodological approach
    Manente, Sabrina
    Micheluz, Anna
    Ganzerla, Renzo
    Ravagnan, Giampietro
    Gambaro, Andrea
    INTERNATIONAL BIODETERIORATION & BIODEGRADATION, 2012, 74 : 99 - 108
  • [10] Analysing the cost of quality within a supply chain using system dynamics approach
    Alglawe, Asama
    Schiffauerova, Andrea
    Kuzgunkaya, Onur
    TOTAL QUALITY MANAGEMENT & BUSINESS EXCELLENCE, 2019, 30 (15-16) : 1630 - 1653