Adapting Text Embeddings for Causal Inference

被引:0
|
作者
Veitch, Victor [1 ]
Sridhar, Dhanya
Blei, David M.
机构
[1] Columbia Univ, Dept Stat, New York, NY 10027 USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Does adding a theorem to a paper affect its chance of acceptance? Does labeling a post with the author's gender affect the post popularity? This paper develops a method to estimate such causal effects from observational text data, adjusting for confounding features of the text such as the subject or writing quality. We assume that the text suffices for causal adjustment but that, in practice, it is prohibitively high-dimensional. To address this challenge, we develop causally sufficient embeddings, low-dimensional document representations that preserve sufficient information for causal identification and allow for efficient estimation of causal effects. Causally sufficient embeddings combine two ideas. The first is supervised dimensionality reduction: causal adjustment requires only the aspects of text that are predictive of both the treatment and outcome. The second is efficient language modeling: representations of text are designed to dispose of linguistically irrelevant information, and this information is also causally irrelevant. Our method adapts language models (specifically, word embeddings and topic models) to learn document embeddings that are able to predict both treatment and outcome. We study causally sufficient embeddings with semi-synthetic datasets and find that they improve causal estimation over related embedding methods. We illustrate the methods by answering the two motivating questions-the effect of a theorem on paper acceptance and the effect of a gender label on post popularity. Code and data available at github.com/vveitch/causaltext-embeddings-tf2.
引用
收藏
页码:919 / 928
页数:10
相关论文
共 50 条
  • [31] The Challenge of Causal Inference
    Dammann, Olaf
    Leviton, Alan
    ANNALS OF NEUROLOGY, 2010, 68 (05) : 770 - 770
  • [32] THE RATIONALITY OF CAUSAL INFERENCE
    SHULTZ, TR
    BEHAVIORAL AND BRAIN SCIENCES, 1991, 14 (03) : 503 - 503
  • [33] Causal Graph Inference
    Poilinca, Simona
    Parajuli, Jhanak
    Abreu, Giuseppe
    2015 49TH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, 2015, : 1209 - 1213
  • [34] An Introduction to Causal Inference
    Pearl, Judea
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2010, 6 (02):
  • [35] The Future of Causal Inference
    Mitra, Nandita
    Roy, Jason
    Small, Dylan
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2022, 191 (10) : 1671 - 1676
  • [36] THE EROSION OF CAUSAL INFERENCE
    Weed, D. L.
    Alexander, D.
    Perez, V.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2011, 173 : S187 - S187
  • [37] Indeterminism and causal inference
    Pedro, Inaki San
    Suarez, Mauricio
    TEOREMA, 2014, 33 (01): : 95 - 109
  • [38] THE FOUNDATIONS OF CAUSAL INFERENCE
    Pearl, Judea
    SOCIOLOGICAL METHODOLOGY, VOL 40, 2010, 40 : 75 - 149
  • [39] Causal Inference in NARS
    Xu, Bowen
    Wang, Pei
    ARTIFICIAL GENERAL INTELLIGENCE, AGI 2024, 2024, 14951 : 199 - 209
  • [40] Prediction and causal inference
    Gagliardi, Luigi
    ACTA PAEDIATRICA, 2009, 98 (12) : 1890 - 1892