Zero-Inflated Exponential Family Embeddings

被引:0
|
作者
Liu, Li-Ping [1 ,2 ]
Blei, David M. [1 ]
机构
[1] Columbia Univ, 500 W 120th St, New York, NY 10027 USA
[2] Tufts Univ, 161 Coll Ave, Medford, MA 02155 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70 | 2017年 / 70卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings are a widely-used tool to analyze language, and exponential family embeddings (Rudolph et al., 2016) generalize the technique to other types of data. One challenge to fitting embedding methods is sparse data, such as a document/term matrix that contains many zeros. To address this issue, practitioners typically downweight or subsample the zeros, thus focusing learning on the non-zero entries. In this paper, we develop zero-inflated embeddings, a new embedding method that is designed to learn from sparse observations. In a zero-inflated embedding (ZIE), a zero in the data can come from an interaction to other data (i.e., an embedding) or from a separate process by which many observations are equal to zero (i.e. a probability mass at zero). Fitting a ZIE naturally downweights the zeros and dampens their influence on the model. Across many types of data-language, movie ratings, shopping histories, and bird watching logs-we found that zero-inflated embeddings provide improved predictive performance over standard approaches and find better vector representation of items.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] A Zero-Inflated Regression Model for Grouped Data
    Brown, Sarah
    Duncan, Alan
    Harris, Mark N.
    Roberts, Jennifer
    Taylor, Karl
    OXFORD BULLETIN OF ECONOMICS AND STATISTICS, 2015, 77 (06) : 822 - 831
  • [32] Bayesian Analysis for the Zero-inflated Regression Models
    Jane, Hakjin
    Kang, Yunhee
    Lee, S.
    Kim, Seong W.
    KOREAN JOURNAL OF APPLIED STATISTICS, 2008, 21 (04) : 603 - 613
  • [33] On zero-inflated mixed Poisson transmuted exponential distribution: Properties and applications to observation with excess zeros
    Adetunji, Ademola Abiodun
    Sabri, Shamsul Rijal Muhammed
    MAEJO INTERNATIONAL JOURNAL OF SCIENCE AND TECHNOLOGY, 2023, 17 (01) : 68 - 80
  • [34] Multivariate zero-inflated Poisson models and their applications
    Li, CS
    Lu, JC
    Park, JH
    TECHNOMETRICS, 1999, 41 (01) : 29 - 38
  • [35] Small Area Estimation for Zero-Inflated Data
    Chandra, Hukum
    Sud, U. C.
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2012, 41 (05) : 632 - 643
  • [36] Semiparametric analysis of zero-inflated count data
    Lam, K. F.
    Xue, Hongqi
    Cheung, Yin Bun
    BIOMETRICS, 2006, 62 (04) : 996 - 1003
  • [37] Document analysis and visualization with zero-inflated poisson
    Alvarez, Dora
    Hidalgo, Hugo
    DATA MINING AND KNOWLEDGE DISCOVERY, 2009, 19 (01) : 1 - 23
  • [38] A zero-inflated overdispersed hierarchical Poisson model
    Kassahun, Wondwosen
    Neyens, Thomas
    Faes, Christel
    Molenberghs, Geert
    Verbeke, Geert
    STATISTICAL MODELLING, 2014, 14 (05) : 439 - 456
  • [39] Zero-Inflated Spatial Models: Application and Interpretation
    Ainsworth, L. M.
    Dean, C. B.
    Joy, R.
    ADVANCES AND CHALLENGES IN PARAMETRIC AND SEMI-PARAMETRIC ANALYSIS FOR CORRELATED DATA, 2016, 218 : 75 - 96
  • [40] Confidence intervals for zero-inflated gamma distribution
    Wang, Xiao
    Li, Min
    Sun, Weina
    Gao, Zheng
    Li, Xinmin
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (07) : 3418 - 3435