Adapting Text Embeddings for Causal Inference

被引：0

作者：

Veitch, Victor ^{[1
]}

Sridhar, Dhanya

Blei, David M.

机构：

[1] Columbia Univ, Dept Stat, New York, NY 10027 USA

来源：

CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020) | 2020年 / 124卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Does adding a theorem to a paper affect its chance of acceptance? Does labeling a post with the author's gender affect the post popularity? This paper develops a method to estimate such causal effects from observational text data, adjusting for confounding features of the text such as the subject or writing quality. We assume that the text suffices for causal adjustment but that, in practice, it is prohibitively high-dimensional. To address this challenge, we develop causally sufficient embeddings, low-dimensional document representations that preserve sufficient information for causal identification and allow for efficient estimation of causal effects. Causally sufficient embeddings combine two ideas. The first is supervised dimensionality reduction: causal adjustment requires only the aspects of text that are predictive of both the treatment and outcome. The second is efficient language modeling: representations of text are designed to dispose of linguistically irrelevant information, and this information is also causally irrelevant. Our method adapts language models (specifically, word embeddings and topic models) to learn document embeddings that are able to predict both treatment and outcome. We study causally sufficient embeddings with semi-synthetic datasets and find that they improve causal estimation over related embedding methods. We illustrate the methods by answering the two motivating questions-the effect of a theorem on paper acceptance and the effect of a gender label on post popularity. Code and data available at github.com/vveitch/causaltext-embeddings-tf2.

引用

页码：919 / 928

页数：10

共 50 条

[21] General Purpose Text Embeddings from Pre-trained Language Models for Scalable Inference
Du, Jingfei
Ott, Myle
Li, Haoran
Zhou, Xing
Stoyanov, Veselin
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020,
[22] Causal inference for psychologists who think that causal inference is not for them
Rohrer, Julia M.
SOCIAL AND PERSONALITY PSYCHOLOGY COMPASS, 2024, 18 (03)
[23] Text Embeddings Reveal (Almost) As Much As Text
Morris, John X.
Kuleshov, Volodymyr
Shmatikov, Vitaly
Rush, Alexander M.
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 12448 - 12460
[24] Causal Knowledge Extraction from Text using Natural Language Inference (Student Abstract)
Bhandari, Manik
Feblowitz, Mark
Hassanzadeh, Oktie
Srinivas, Kavitha
Sohrabi, Shirin
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 15759 - 15760
[25] The Causal Effects of Causal Inference Pedagogy
Swanson, Sonja A. A.
EPIDEMIOLOGY, 2023, 34 (05) : 611 - 613
[26] The impact of the #MeToo movement on language at court A text-based causal inference approach
Langen, Henrika
PLOS ONE, 2024, 19 (05):
[27] Text classification with document embeddings
Huang, Chaochao (chaochaohuang12@fudan.edu.cn), 1600, Springer Verlag (8801):
[28] Text Classification with Document Embeddings
Huang, Chaochao
Qiu, Xipeng
Huang, Xuanjing
CHINESE COMPUTATIONAL LINGUISTICS AND NATURAL LANGUAGE PROCESSING BASED ON NATURALLY ANNOTATED BIG DATA, CCL 2014, 2014, 8801 : 131 - 140
[29] Causal Embeddings for Recommendation: An Extended Abstract
Vasile, Flavian
Bonner, Stephen
PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 6236 - 6240
[30] Private Causal Inference
Kusner, Matt J.
Sun, Yu
Sridharan, Karthik
Weinberger, Kilian Q.
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 1308 - 1317

← 1 2 3 4 5 →