An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

被引:11
|
作者
Bell, Michael J. [1 ]
Gillespie, Colin S. [2 ]
Swan, Daniel [3 ]
Lord, Phillip [1 ]
机构
[1] Newcastle Univ, Sch Comp Sci, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[2] Newcastle Univ, Sch Math & Stat, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
[3] Newcastle Univ, Bioinformat Support Unit, ICAMB, Sch Med, Newcastle Upon Tyne NE1 7RU, Tyne & Wear, England
关键词
GENE ONTOLOGY; SUPPLEMENT TREMBL; PROTEIN FUNCTION; SEQUENCE; DATABASE;
D O I
10.1093/bioinformatics/bts372
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use the UniProt Knowledgebase (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. Results: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality.
引用
收藏
页码:I562 / I568
页数:7
相关论文
共 50 条
  • [21] A Quality Assurance Approach and Case Study in BSS
    Huang, Teh-Sheng
    Chan, Chia-Yen
    Jeng, Jeu-Yih
    IEEE 30TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA 2016), 2016, : 867 - 871
  • [22] A NEW APPROACH TO QUALITY ENHANCEMENT: A CASE STUDY
    Arsovski, Slavko
    Arsovski, Zora
    Stefanovic, Miladin
    INTERNATIONAL JOURNAL FOR QUALITY RESEARCH, 2011, 5 (04) : 261 - 267
  • [23] Analysing Productivity Changes Using the Bootstrapped Malmquist Approach: The Case of the Iranian Banking Industry
    Arjomandi, Amir
    Valadkhani, Abbas
    Harvie, Charles
    AUSTRALASIAN ACCOUNTING BUSINESS AND FINANCE JOURNAL, 2011, 5 (03) : 35 - +
  • [24] Methodological processes in validating and analysing the quality of population-based data: a case study using the Victorian Perinatal Data Collection
    Davey, Mary-Ann
    Sloan, Mary-Louise
    Palma, Sonia
    Riley, Merilyn
    King, James
    HEALTH INFORMATION MANAGEMENT JOURNAL, 2013, 42 (03) : 12 - 19
  • [25] Graph theoretic approach for analysing the readiness of an organisation for adapting lean thinking A case study
    Gurumurthy, Anand
    Mazumdar, Prasoon
    Muthusubramanian, Sowmiya
    INTERNATIONAL JOURNAL OF ORGANIZATIONAL ANALYSIS, 2013, 21 (03) : 396 - +
  • [26] Modelling-based approach of analysing diversion impacts: a case study of the Brahmaputra basin
    Dutta, Pulendra
    Sarma, Arup Kumar
    CURRENT SCIENCE, 2020, 119 (06): : 1010 - 1018
  • [27] Career versus motherhood? A case study describing a cognitive-existential approach to the dilemma
    Dingle, G
    BEHAVIOUR CHANGE, 2002, 19 (01) : 2 - 11
  • [28] Determination of rock quality designation (RQD) using a novel geophysical approach: a case study
    Muhammad Hasan
    Yanjun Shang
    Xuetao Yi
    Peng Shao
    He Meng
    Bulletin of Engineering Geology and the Environment, 2023, 82
  • [29] Determination of rock quality designation (RQD) using a novel geophysical approach: a case study
    Hasan, Muhammad
    Shang, Yanjun
    Yi, Xuetao
    Shao, Peng
    Meng, He
    BULLETIN OF ENGINEERING GEOLOGY AND THE ENVIRONMENT, 2023, 82 (03)
  • [30] Biological indices of soil quality: an ecosystem case study of their use
    Knoepp, JD
    Coleman, DC
    Crossley, DA
    Clark, JS
    FOREST ECOLOGY AND MANAGEMENT, 2000, 138 (1-3) : 357 - 368