Empath: Understanding Topic Signals in Large-Scale Text

被引：181

作者：

Fast, Ethan ^{[1
]}

Chen, Binbin ^{[1
]}

Bernstein, Michael S. ^{[1
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

来源：

34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016 | 2016年

关键词：

social computing; computational social science; fiction;

D O I：

10.1145/2858036.2858535

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human language is colored by a broad range of topics, but existing text analysis tools only focus on a small number of them. We present Empath, a tool that can generate and validate new lexical categories on demand from a small set of seed terms (like "bleed" and "punch" to generate the category violence). Empath draws connotations between words and phrases by deep learning a neural embedding across more than 1.8 billion words of modern fiction. Given a small set of seed words that characterize a category, Empath uses its neural embedding to discover new related terms, then validates the category with a crowd-powered filter. Empath also analyzes text across 200 built-in, pre-validated categories we have generated from common topics in our web dataset, like neglect, government, and social media. We show that Empath's data-driven, human validated categories are highly correlated (r=0.906) with similar categories in LIWC.

引用

页码：4647 / 4657

页数：11

共 50 条

[31] Large-Scale High-Precision Topic Modeling on Twitter
Yang, Shuang
Kolcz, Alek
Schlaikjer, Andy
Gupta, Pankaj
[J]. PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, : 1907 - 1916
[32] Advances in the understanding of the large-scale gap test
Burley, S. J.
Bourne, N. K.
Fung, V.
Hollands, R.
Millett, J. C. F.
Milne, A. M.
Wood, A.
[J]. Shock Compression of Condensed Matter - 2005, Pts 1 and 2, 2006, 845 : 944 - 947
[33] Understanding Large-Scale Software - A Hierarchical View
Levy, Omer
Feitelson, Dror G.
[J]. 2019 IEEE/ACM 27TH INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2019), 2019, : 283 - 293
[34] Understanding Source Code Comments at Large-Scale
He, Hao
[J]. ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, : 1217 - 1219
[35] Understanding the Context of Large-Scale IT Project Failures
Rich, Eliot
Nelson, Mark R.
[J]. INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGIES AND SYSTEMS APPROACH, 2012, 5 (02) : 1 - 24
[36] Understanding Coarsening for Embedding Large-Scale Graphs
Akyildiz, Taha Atahan
Aljundi, Amro Alabsi
Kaya, Kamer
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 2937 - 2946
[37] Understanding Large-Scale Dynamic Purchase Behavior
Jacobs, Bruno
Fok, Dennis
Donkers, Bas
[J]. MARKETING SCIENCE, 2021, 40 (05) : 844 - 870
[38] UIMA GRID: Distributed large-scale text analysis
Egner, Michael Thomas
Lorch, Markus
Biddle, Edd
[J]. CCGRID 2007: SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, 2007, : 317 - +
[39] Large-Scale Extraction and Use of Knowledge from Text
Clark, Peter
Harrison, Phil
[J]. K-CAP'09: PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE, 2009, : 153 - 160
[40] Large-scale Bayesian logistic regression for text categorization
Genkin, Alexander
Lewis, David D.
Madigan, David
[J]. TECHNOMETRICS, 2007, 49 (03) : 291 - 304

← 1 2 3 4 5 →