Adapting Text Embeddings for Causal Inference

被引:0
|
作者
Veitch, Victor [1 ]
Sridhar, Dhanya
Blei, David M.
机构
[1] Columbia Univ, Dept Stat, New York, NY 10027 USA
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Does adding a theorem to a paper affect its chance of acceptance? Does labeling a post with the author's gender affect the post popularity? This paper develops a method to estimate such causal effects from observational text data, adjusting for confounding features of the text such as the subject or writing quality. We assume that the text suffices for causal adjustment but that, in practice, it is prohibitively high-dimensional. To address this challenge, we develop causally sufficient embeddings, low-dimensional document representations that preserve sufficient information for causal identification and allow for efficient estimation of causal effects. Causally sufficient embeddings combine two ideas. The first is supervised dimensionality reduction: causal adjustment requires only the aspects of text that are predictive of both the treatment and outcome. The second is efficient language modeling: representations of text are designed to dispose of linguistically irrelevant information, and this information is also causally irrelevant. Our method adapts language models (specifically, word embeddings and topic models) to learn document embeddings that are able to predict both treatment and outcome. We study causally sufficient embeddings with semi-synthetic datasets and find that they improve causal estimation over related embedding methods. We illustrate the methods by answering the two motivating questions-the effect of a theorem on paper acceptance and the effect of a gender label on post popularity. Code and data available at github.com/vveitch/causaltext-embeddings-tf2.
引用
收藏
页码:919 / 928
页数:10
相关论文
共 50 条
  • [21] General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference
    Du, Jingfei
    Ott, Myle
    Li, Haoran
    Zhou, Xing
    Stoyanov, Veselin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
  • [22] Causal inference for psychologists who think that causal inference is not for them
    Rohrer, Julia M.
    SOCIAL AND PERSONALITY PSYCHOLOGY COMPASS, 2024, 18 (03)
  • [23] Text Embeddings Reveal (Almost) As Much As Text
    Morris, John X.
    Kuleshov, Volodymyr
    Shmatikov, Vitaly
    Rush, Alexander M.
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12448 - 12460
  • [24] Causal Knowledge Extraction from Text using Natural Language Inference (Student Abstract)
    Bhandari, Manik
    Feblowitz, Mark
    Hassanzadeh, Oktie
    Srinivas, Kavitha
    Sohrabi, Shirin
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15759 - 15760
  • [25] The Causal Effects of Causal Inference Pedagogy
    Swanson, Sonja A. A.
    EPIDEMIOLOGY, 2023, 34 (05) : 611 - 613
  • [26] The impact of the #MeToo movement on language at court A text-based causal inference approach
    Langen, Henrika
    PLOS ONE, 2024, 19 (05):
  • [27] Text classification with document embeddings
    Huang, Chaochao (chaochaohuang12@fudan.edu.cn), 1600, Springer Verlag (8801):
  • [28] Text Classification with Document Embeddings
    Huang, Chaochao
    Qiu, Xipeng
    Huang, Xuanjing
    CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 131 - 140
  • [29] Causal Embeddings for Recommendation: An Extended Abstract
    Vasile, Flavian
    Bonner, Stephen
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6236 - 6240
  • [30] Private Causal Inference
    Kusner, Matt J.
    Sun, Yu
    Sridharan, Karthik
    Weinberger, Kilian Q.
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 1308 - 1317