The language of gene ontology: a Zipf's law analysis

被引:10
|
作者
Kalankesh, Leila Ranandeh [1 ]
Stevens, Robert [1 ]
Brass, Andy [1 ,2 ]
机构
[1] Univ Manchester, Sch Comp Sci, Manchester M13 9PL, Lancs, England
[2] Univ Manchester, Fac Life Sci, Manchester M13 9PL, Lancs, England
来源
BMC BIOINFORMATICS | 2012年 / 13卷
关键词
LEAST EFFORT; ANNOTATION; DISTRIBUTIONS; DATABASE;
D O I
10.1186/1471-2105-13-127
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Most major genome projects and sequence databases provide a GO annotation of their data, either automatically or through human annotators, creating a large corpus of data written in the language of GO. Texts written in natural language show a statistical power law behaviour, Zipf's law, the exponent of which can provide useful information on the nature of the language being used. We have therefore explored the hypothesis that collections of GO annotations will show similar statistical behaviours to natural language. Results: Annotations from the Gene Ontology Annotation project were found to follow Zipf's law. Surprisingly, the measured power law exponents were consistently different between annotation captured using the three GO sub-ontologies in the corpora (function, process and component). On filtering the corpora using GO evidence codes we found that the value of the measured power law exponent responded in a predictable way as a function of the evidence codes used to support the annotation. Conclusions: Techniques from computational linguistics can provide new insights into the annotation process. GO annotations show similar statistical behaviours to those seen in natural language with measured exponents that provide a signal which correlates with the nature of the evidence codes used to support the annotations, suggesting that the measured exponent might provide a signal regarding the information content of the annotation.
引用
收藏
页数:7
相关论文
共 50 条
  • [41] Maximal Diversity and Zipf's Law
    Mazzarisi, Onofrio
    De Azevedo-Lopes, Amanda
    Arenzon, Jeferson J.
    Corberi, Federico
    PHYSICAL REVIEW LETTERS, 2021, 127 (12)
  • [42] Limit laws for Zipf's law
    Eliazar, Iddo
    JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2011, 44 (02)
  • [43] ANALYSIS OF ZIPF LAW - AN INDEX APPROACH
    CHEN, YS
    LEIMKUHLER, FF
    INFORMATION PROCESSING & MANAGEMENT, 1987, 23 (03) : 171 - 182
  • [44] Zipf's word frequency law in natural language: A critical review and future directions
    Piantadosi, Steven T.
    PSYCHONOMIC BULLETIN & REVIEW, 2014, 21 (05) : 1112 - 1130
  • [45] On the Implications of Zipf's Law in Passwords
    Wang, Ding
    Wang, Ping
    COMPUTER SECURITY - ESORICS 2016, PT I, 2016, 9878 : 111 - 131
  • [46] Bias in Zipf's law estimators
    Pilgrim, Charlie
    Hills, Thomas T.
    SCIENTIFIC REPORTS, 2021, 11 (01)
  • [47] Zipf’s word frequency law in natural language: A critical review and future directions
    Steven T. Piantadosi
    Psychonomic Bulletin & Review, 2014, 21 : 1112 - 1130
  • [48] On the emergence of Zipf 's law in music
    Perotti, Juan, I
    Billoni, Orlando, V
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 549 (549)
  • [49] Bias in Zipf’s law estimators
    Charlie Pilgrim
    Thomas T Hills
    Scientific Reports, 11
  • [50] Zipf's Law for Indian Languages
    Jayaram, B. D.
    Vidya, M. N.
    JOURNAL OF QUANTITATIVE LINGUISTICS, 2008, 15 (04) : 293 - 317