Omnibus Sequences, Coupon Collection, and Missing Word Counts

被引:6
|
作者
Abraham, Sunil [1 ]
Brockman, Greg [2 ]
Sapp, Stephanie [3 ]
Godbole, Anant P. [4 ]
机构
[1] Univ Oxford, Oxford, England
[2] MIT, Cambridge, MA 02139 USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
[4] E Tennessee State Univ, Johnson City, TN 37614 USA
基金
美国国家科学基金会;
关键词
Coupon collection; Omnibus sequences; Extreme value distribution;
D O I
10.1007/s11009-011-9247-6
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we study the properties of k-omnisequences of length n, defined to be strings of length n that contain all strings of smaller length k embedded as (not necessarily contiguous) subsequences. We start by proving an elementary result that relates our problem to the classical coupon collector problem. After a short survey of relevant results in coupon collection, we focus our attention on the number M of strings (or words) of length k that are not found as subsequences of an n string, showing that there is a gap between the probability threshold for the emergence of an omnisequence and the zero-infinity threshold for .
引用
收藏
页码:363 / 378
页数:16
相关论文
共 50 条
  • [41] The Missing Word of History: Holderlin and "Communism"
    Albernaz, Joseph
    [J]. GERMANIC REVIEW, 2022, 97 (01): : 7 - 29
  • [42] A Speech System for Estimating Daily Word Counts
    Ziaei, Ali
    Sangwan, Abhijeet
    Hansen, John H. L.
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 880 - 884
  • [43] Measuring Cultural Diversity in Text with Word Counts
    Wood, Michael Lee
    [J]. SOCIAL PSYCHOLOGY QUARTERLY, 2024, 87 (03) : 205 - 226
  • [44] Teachability of collocations: The role of word frequency counts
    Nizonkiza, Deogratias
    Van de Poel, Kris
    [J]. SOUTHERN AFRICAN LINGUISTICS AND APPLIED LANGUAGE STUDIES, 2014, 32 (03) : 301 - 316
  • [45] WORD AND IDIOM FREQUENCY COUNTS IN FRENCH AND THEIR VALUE
    Fotos, John T.
    [J]. MODERN LANGUAGE JOURNAL, 1931, 15 (05): : 344 - 353
  • [46] An overview on the distribution of word counts in Markov chains
    Schbath, S
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2000, 7 (1-2) : 193 - 201
  • [47] What counts as effective input for word learning?
    Shneidman, Laura A.
    Arroyo, Michelle E.
    Levine, Susan C.
    Goldin-Meadow, Susan
    [J]. JOURNAL OF CHILD LANGUAGE, 2013, 40 (03) : 672 - 686
  • [48] Textual Similarity for Word Sequences
    Konaka, Fumito
    Miura, Takao
    [J]. SIMILARITY SEARCH AND APPLICATIONS, SISAP 2015, 2015, 9371 : 244 - 249
  • [49] Estimating Missing Data Values for Georeferenced Poisson Counts
    Griffith, Daniel A.
    [J]. GEOGRAPHICAL ANALYSIS, 2013, 45 (03) : 259 - 284
  • [50] Matching patterns for updating missing values of traffic counts
    Zhong, Ming
    Sharma, Satish
    Lingras, Pawan
    [J]. TRANSPORTATION PLANNING AND TECHNOLOGY, 2006, 29 (02) : 141 - 156