The language of gene ontology: a Zipf's law analysis

被引:10
|
作者
Kalankesh, Leila Ranandeh [1 ]
Stevens, Robert [1 ]
Brass, Andy [1 ,2 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[2] Univ Manchester, Fac Life Sci, Manchester M13 9PL, Lancs, England
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
LEAST EFFORT; ANNOTATION; DISTRIBUTIONS; DATABASE;
D O I
10.1186/1471-2105-13-127
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Results: Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Conclusions: Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.
引用
收藏
页数:7
相关论文
共 50 条
  • [31] Zipf's law and the growth of cities
    Gabaix, X
    AMERICAN ECONOMIC REVIEW, 1999, 89 (02): : 129 - 132
  • [32] Zipf's law for cities: An explanation
    Gabaix, X
    QUARTERLY JOURNAL OF ECONOMICS, 1999, 114 (03): : 739 - 767
  • [33] Dynamical approach to Zipf's law
    De Marzo, Giordano
    Gabrielli, Andrea
    Zaccaria, Andrea
    Pietronero, Luciano
    PHYSICAL REVIEW RESEARCH, 2021, 3 (01):
  • [34] Zipf's Law for Web Surfers
    Mark Levene
    José Borges
    George Loizou
    Knowledge and Information Systems, 2001, 3 (1) : 120 - 129
  • [35] Concentration indices and Zipf's law
    Naldi, M
    ECONOMICS LETTERS, 2003, 78 (03) : 329 - 334
  • [36] Zipf's mean and language typology
    Popescu, Ioan-Iovitz
    Altmann, Gabriel
    GLOTTOMETRICS, 2008, 16 : 31 - 37
  • [37] Territorial Planning and Zipf's Law
    Kabanov, V. N.
    ECONOMIC AND SOCIAL CHANGES-FACTS TRENDS FORECAST, 2019, 12 (02) : 103 - 114
  • [38] A GEOGRAPHICAL THEORY FOR ZIPF'S LAW
    Pumain, Denise
    REGION ET DEVELOPPEMENT, 2012, (36): : 31 - 54
  • [39] Zipf's law and phase transition
    Lukierska-Walasek, K.
    Topolski, K.
    MODERN PHYSICS LETTERS B, 2014, 28 (11):
  • [40] SOME COMMENTS ON ZIPF LAW FOR THE CHINESE-LANGUAGE
    SHTRIKMAN, S
    JOURNAL OF INFORMATION SCIENCE, 1994, 20 (02) : 142 - 143