Word length, sentence length and frequency - Zipf revisited

被引:87
|
作者
Sigurd, B [1 ]
Eeg-Olofsson, M [1 ]
van de Weijer, J [1 ]
机构
[1] Lund Univ, Dept Phonet & Linguist, S-22362 Lund, Sweden
关键词
D O I
10.1111/j.0039-3193.2004.00109.x
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper examines data from English, Swedish and German in order to find a theoretical distribution that describes the observed relation between word length and frequency. In Swedish and English, most word tokens consist of three letters only, while shorter or longer words occur less frequently. We found that the equation with the general form f(exp) = a * L-b * c(L) (a variant of the so-called gamma distribution) approximates the observed frequencies reasonably well. This formula incorporates both the fact that the number of possible words increases with word length, and the fact that longer words tend to be avoided, presumably because they are uneconomic. To our knowledge this formula has not been proposed to describe word frequency data. We examined frequency distributions of word length in Swedish and English, and explored different variants of the equation by systematically varying the a, b and c parameters. Subsequently, we also applied the formula to the frequency distribution of sentence length in English, and found an almost perfect fit for a corpus consisting of different text genres. Moreover, the data showed that the formula can be used to distinguish between different kinds of text genres.
引用
收藏
页码:37 / 52
页数:16
相关论文
共 50 条
  • [1] Dynamics of Word Length in Sentence
    Fan, Fengxiang
    Grzybek, Peter
    Altmann, Gabriel
    [J]. GLOTTOMETRICS, 2010, 20 : 70 - 109
  • [2] Syntactic structure and word length in sentence
    Engelkamp, J
    Rummer, R
    [J]. ZEITSCHRIFT FUR EXPERIMENTELLE PSYCHOLOGIE, 1999, 46 (01): : 1 - 15
  • [3] The relationship of word length and sentence length: The inter-textual perspective
    Grzybek, Peter
    Stadlober, Ernst
    Kelih, Emmerich
    [J]. ADVANCES IN DATA ANALYSIS, 2007, : 611 - +
  • [4] Frequency and Word-Length Factors and Lexical Retrieval in Sentence Production in Aphasia
    Goral, Mira
    Levy, Erika
    Swann-Sternberg, Tali
    Obler, Loraine
    [J]. AOA2010, 48TH ACADEMY OF APHASIA PROCEEDINGS, 2010, 6 : 107 - +
  • [5] Finite size correction for fixed word length Zipf analysis
    A. H. Darooneh
    B. Rahmani
    [J]. The European Physical Journal B, 2009, 70 : 287 - 291
  • [6] Finite size correction for fixed word length Zipf analysis
    Darooneh, A. H.
    Rahmani, B.
    [J]. EUROPEAN PHYSICAL JOURNAL B, 2009, 70 (02): : 287 - 291
  • [7] Word frequency and arc length
    Popescu, Ioan-Iovitz
    Macutek, Jan
    Altmann, Gabriel
    [J]. GLOTTOMETRICS, 2008, 17 : 18 - 42
  • [8] RELATIONS OF STUTTERING TO WORD LENGTH AND WORD FREQUENCY
    SODERBERG, GA
    [J]. JOURNAL OF SPEECH AND HEARING RESEARCH, 1966, 9 (04): : 584 - +
  • [9] SENTENCE LENGTH
    HUXTABLE, RJ
    [J]. SCIENCE, 1977, 197 (4300) : 208 - 208
  • [10] Word length interval dependency for mobile rapid sentence reading
    Maeda, Toshiyuki
    Wakatani, Akiyoshi
    Yajima, Masumi
    [J]. Proceedings - 2019 International Conference on Future Internet of Things and Cloud Workshops, FiCloudW 2019, 2019, : 49 - 52