Visualizing bivariate long-tailed data

被引:1
|
作者
Dyer, Justin S. [1 ]
Owen, Art B. [1 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
来源
基金
美国国家科学基金会;
关键词
Copula; bivariate Zipf; bipartite preferential attachment; preferential attachment; Zipf-Mandelbrot; COMPLEX NETWORKS;
D O I
10.1214/11-EJS622
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Variables in large data sets in biology or e-commerce often have a head, made up of very frequent values and a long tail of ever rarer values. Models such as the Zip for Zipf-Mandelbrot provide a good description. The problem we address here is the visualization of two such long-tailed variables, as one might see in a bivariate Zipf context. We introduce a copula plot to display the joint behavior of such variables. The plot uses an empirical ordering of the data; we prove that this ordering is a symptotically accurate in a Zipf-Mandelbrot-Poisson model. We often see an association between entities at the head of one variable with those from the tail of the other. We present two generative models (saturation and bipartite preferential attachment) that show such qualitative behavior and we characterize the power law behavior of the marginal distributions in these models.
引用
收藏
页码:642 / 668
页数:27
相关论文
共 50 条
  • [1] Easy balanced mixing for long-tailed data
    Zhu, Zonghai
    Xing, Huanlai
    Xu, Yuge
    KNOWLEDGE-BASED SYSTEMS, 2022, 248
  • [2] Fitting long-tailed distribution to empirical data
    Gil, Joseph
    Monni, Cristina
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2017, 29 (24):
  • [3] The long-tailed rat
    Gold, AG
    ASIAN FOLKLORE STUDIES, 2004, 63 (02): : 243 - 265
  • [4] LONG-TAILED PAIR
    SCROGGIE, MG
    WIRELESS WORLD, 1968, 74 (1396): : 369 - &
  • [5] Exploiting the Tail Data for Long-Tailed Face Recognition
    Song, Guo
    Liu, Rujie
    Wang, Mengjiao
    Meng, Zhang
    Nie, Shijie
    Lina, Septiana
    Abe, Narishige
    IEEE ACCESS, 2022, 10 : 97945 - 97953
  • [6] Analysis of long-tailed count data by Poisson mixtures
    Gupta, RC
    Ong, SH
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2005, 34 (03) : 557 - 573
  • [7] Learning from Reduced Labels for Long-Tailed Data
    Wei, Meng
    Li, Zhongnian
    Zhou, Yong
    Xu, Xinzheng
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 111 - 119
  • [8] Trustworthy Long-Tailed Classification
    Li, Bolian
    Han, Zongbo
    Li, Haining
    Fu, Huazhu
    Zhang, Changqing
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 6960 - 6969
  • [9] The long-tailed field mouse
    不详
    AMERICAN NATURALIST, 1901, 35 : 683 - 683
  • [10] Long-Tailed Food Classification
    He, Jiangpeng
    Lin, Luotao
    Eicher-Miller, Heather A.
    Zhu, Fengqing
    NUTRIENTS, 2023, 15 (12)