The language of gene ontology: a Zipf's law analysis

被引:10
|
作者
Kalankesh, Leila Ranandeh [1 ]
Stevens, Robert [1 ]
Brass, Andy [1 ,2 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[2] Univ Manchester, Fac Life Sci, Manchester M13 9PL, Lancs, England
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
LEAST EFFORT; ANNOTATION; DISTRIBUTIONS; DATABASE;
D O I
10.1186/1471-2105-13-127
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Results: Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Conclusions: Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] The language of gene ontology: a Zipf’s law analysis
    Leila Ranandeh Kalankesh
    Robert Stevens
    Andy Brass
    BMC Bioinformatics, 13
  • [2] Zipf's law in gene expression
    Furusawa, C
    Kaneko, K
    PHYSICAL REVIEW LETTERS, 2003, 90 (08) : 1 - 088102
  • [3] True reason for Zipf's law in language
    Wang, DH
    Li, MH
    Di, ZR
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2005, 358 (2-4) : 545 - 550
  • [4] The variation of Zipf's law in human language
    Cancho, RFI
    EUROPEAN PHYSICAL JOURNAL B, 2005, 44 (02): : 249 - 257
  • [5] The variation of Zipf’s law in human language
    R. Ferrer i Cancho
    The European Physical Journal B - Condensed Matter and Complex Systems, 2005, 44 : 249 - 257
  • [6] Mandelbrot's Model for Zipf's Law Can Mandelbrot's Model Explain Zipf's Law for Language?
    Manin, D. Yu.
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2009, 16 (03) : 274 - 285
  • [7] The Evolution of the Exponent of Zipf's Law in Language Ontogeny
    Baixeries, Jaume
    Elvevag, Brita
    Ferrer-i-Cancho, Ramon
    PLOS ONE, 2013, 8 (03):
  • [8] Occurrence of Zipf's law of transition in the processes of learning the language
    Kitabayashi, N
    SICE 2000: PROCEEDINGS OF THE 39TH SICE ANNUAL CONFERENCE, INTERNATIONAL SESSION PAPERS, 2000, : 283 - 286
  • [9] On Zipf's law and the bias of Zipf regressions
    Schluter, Christian
    EMPIRICAL ECONOMICS, 2021, 61 (02) : 529 - 548
  • [10] IS ZIPF LAW VALID FOR THE CHINESE LANGUAGE
    MEYER, J
    NTZ ARCHIV, 1989, 11 (01): : 13 - 16