The language of gene ontology: a Zipf's law analysis

被引:10
|
作者
Kalankesh, Leila Ranandeh [1 ]
Stevens, Robert [1 ]
Brass, Andy [1 ,2 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[2] Univ Manchester, Fac Life Sci, Manchester M13 9PL, Lancs, England
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
LEAST EFFORT; ANNOTATION; DISTRIBUTIONS; DATABASE;
D O I
10.1186/1471-2105-13-127
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Results: Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Conclusions: Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Universality of Zipf's law
    Kawamura, K
    Hatano, N
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2002, 71 (05) : 1211 - 1213
  • [22] Zipf's law unzipped
    Baek, Seung Ki
    Bernhardsson, Sebastian
    Minnhagen, Petter
    NEW JOURNAL OF PHYSICS, 2011, 13
  • [23] Moment analysis and Zipf law
    Ma, Y. G.
    EUROPEAN PHYSICAL JOURNAL A, 2006, 30 (01): : 227 - 242
  • [24] Zipf's law analysis on the leaked Iranian users' passwords
    Alebouyeh, Zeinab
    Bidgoly, Amir Jalaly
    JOURNAL OF COMPUTER VIROLOGY AND HACKING TECHNIQUES, 2022, 18 (02) : 101 - 116
  • [25] Zipf's Law for Russian Cities: Analysis of New Indicators
    Rastvortseva, Svetlana N.
    Manaeva, Inna, V
    EKONOMIKA REGIONA-ECONOMY OF REGION, 2020, 16 (03): : 935 - 947
  • [26] Zipf’s law analysis on the leaked Iranian users’ passwords
    Zeinab Alebouyeh
    Amir Jalaly Bidgoly
    Journal of Computer Virology and Hacking Techniques, 2022, 18 : 101 - 116
  • [27] Zipf's law and Mandelbrot's constants for Turkish language using Turkish corpus (TurCo)
    Dalkiliç, G
    Çebi, Y
    ADVANCES IN INFORMATION SYSTEMS, PROCEEDINGS, 2004, 3261 : 273 - 282
  • [28] Zipf’s law—another view
    Ioan-Iovitz Popescu
    Gabriel Altmann
    Reinhard Köhler
    Quality & Quantity, 2010, 44 : 713 - 731
  • [29] Snooker Statistics and Zipf's Law
    Hordijk, Wim
    STATS, 2022, 5 (04): : 985 - 992
  • [30] Zipf's law for atlas models
    Fernholz, Ricardo T.
    Fernholz, Robert
    JOURNAL OF APPLIED PROBABILITY, 2020, 57 (04) : 1276 - 1297