Scaling laws and fluctuations in the statistics of word frequencies

被引:38
|
作者
Gerlach, Martin [1 ]
Altmann, Eduardo G. [1 ]
机构
[1] Max Planck Ints Phys Complex Syst, D-01187 Dresden, Germany
来源
NEW JOURNAL OF PHYSICS | 2014年 / 16卷
关键词
scaling laws; stochastic processes; statistical fluctuations; natural language; GROWTH; DISTRIBUTIONS; INNOVATION; DYNAMICS; ORIGIN;
D O I
10.1088/1367-2630/16/11/113010
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
In this paper, we combine statistical analysis of written texts and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. The average vocabulary of an ensemble of fixed-length texts is known to scale sublinearly with the total number of words (Heaps' law). Analyzing the fluctuations around this average in three large databases (Googlengram, English Wikipedia, and a collection of scientific articles), we find that the standard deviation scales linearly with the average (Taylor's law), in contrast to the prediction of decaying fluctuations obtained using simple sampling arguments. We explain both scaling laws (Heaps' and Taylor) by modeling the usage of words using a Poisson process with a fat-tailed distribution of word frequencies (Zipf's law) and topic-dependent frequencies of individual words (as in topic models). Considering topical variations lead to quenched averages, turn the vocabulary size a non-self-averaging quantity, and explain the empirical observations. For the numerous practical applications relying on estimations of vocabulary size, our results show that uncertainties remain large even for long texts. We show how to account for these uncertainties in measurements of lexical richness of texts with different lengths.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] TURBULENT REFRACTIVE-INDEX FLUCTUATIONS - SCALING LAWS AND CLOSURE MODEL PREDICTIONS
    BURK, SD
    BULLETIN OF THE AMERICAN METEOROLOGICAL SOCIETY, 1980, 61 (11) : 1495 - 1496
  • [22] Spectral scaling laws of solar wind fluctuations at 1 AU: Part 1
    Podesta, John J.
    PROCEEDINGS OF THE THIRTEENTH INTERNATIONAL SOLAR WIND CONFERENCE (SOLAR WIND 13), 2013, 1539 : 122 - 125
  • [23] Scaling Laws of Turbulence and Heating of Fast Solar Wind: The Role of Density Fluctuations
    Carbone, V.
    Marino, R.
    Sorriso-Valvo, L.
    Noullez, A.
    Bruno, R.
    PHYSICAL REVIEW LETTERS, 2009, 103 (06)
  • [24] Scaling laws for a system with long-range interactions within Tsallis statistics
    Salazar, R
    Toral, R
    PHYSICAL REVIEW LETTERS, 1999, 83 (21) : 4233 - 4236
  • [25] Logarithmic and nonlogarithmic scaling laws of two-point statistics in wall turbulence
    Mouri, Hideaki
    Morinaga, Takeshi
    Yagi, Toshimasa
    Mori, Kazuyasu
    PHYSICAL REVIEW E, 2020, 101 (05)
  • [26] Laws, power laws and statistics
    Mark Buchanan
    Nature Physics, 2008, 4 : 339 - 339
  • [27] Laws, power laws and statistics
    Buchanan, Mark
    NATURE PHYSICS, 2008, 4 (05) : 339 - 339
  • [28] A new approach to business fluctuations: heterogeneous interacting agents, scaling laws and financial fragility
    Gatti, DD
    Di Guilmi, C
    Gaffeo, E
    Giulioni, G
    Gallegati, M
    Palestrini, A
    JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION, 2005, 56 (04) : 489 - 512
  • [29] The Brevity Law as a Scaling Law, and a Possible Origin of Zipf's Law for Word Frequencies
    Corral, Alvaro
    Serra, Isabel
    ENTROPY, 2020, 22 (02)
  • [30] Comment on "Scaling Laws of Turbulence and Heating of Fast Solar Wind: The Role of Density Fluctuations"
    Forman, M. A.
    Smith, C. W.
    Vasquez, B. J.
    PHYSICAL REVIEW LETTERS, 2010, 104 (18)