Computing Happiness from Textual Data

被引：3

作者：

Mohamed, Emad ^{[1
]}

Mostafa, Sayed A. ^{[2
]}

机构：

[1] Univ Wolverhampton, Res Grp Computat Linguist, Wolverhampton WV1 1LY, England

[2] North Carolina A&T State Univ, Dept Math & Stat, Greensboro, NC 27411 USA

来源：

STATS | 2019年 / 2卷 / 03期

关键词：

fastText; gradient boosting; happiness; lemmatization; lexical analysis; logistic regression; parsing; topic modeling;

D O I：

10.3390/stats2030025

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

In this paper, we use a corpus of about 100,000 happy moments written by people of different genders, marital statuses, parenthood statuses, and ages to explore the following questions: Are there differences between men and women, married and unmarried individuals, parents and non-parents, and people of different age groups in terms of their causes of happiness and how they express happiness? Can gender, marital status, parenthood status and/or age be predicted from textual data expressing happiness? The first question is tackled in two steps: first, we transform the happy moments into a set of topics, lemmas, part of speech sequences, and dependency relations; then, we use each set as predictors in multi-variable binary and multinomial logistic regressions to rank these predictors in terms of their influence on each outcome variable (gender, marital status, parenthood status and age). For the prediction task, we use character, lexical, grammatical, semantic, and syntactic features in a machine learning document classification approach. The classification algorithms used include logistic regression, gradient boosting, and fastText. Our results show that textual data expressing moments of happiness can be quite beneficial in understanding the "causes of happiness" for different social groups, and that social characteristics like gender, marital status, parenthood status, and, to some extent age, can be successfully predicted form such textual data. This research aims to bring together elements from philosophy and psychology to be examined by computational corpus linguistics methods in a way that promotes the use of Natural Language Processing for the Humanities.

引用

页码：347 / 370

页数：24

共 50 条

[21] Data Model for Procedural Modelling from Textual Descriptions
Rodrigues, Roberto
Coelho, Antonio
Reis, Luis Paulo
2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
[22] Automatic Extraction of Words from Chinese Textual Data
王永成
JournalofComputerScienceandTechnology, 1987, (04) : 287 - 291
[23] Textual data analysis
Garnier, Benedicte
POPULATION, 2020, 75 (04): : 630 - 631
[24] Exploring textual data
Biber, D
COMPUTATIONAL LINGUISTICS, 1999, 25 (01) : 165 - 166
[25] THE FLOW OF TEXTUAL INFORMATICS + LITERARY AND LINGUISTIC COMPUTING
WOOLDRIDGE, TR
TEXTE-REVUE DE CRITIQUE ET DE THEORIE LITTERAIRE, 1993, (13-14): : 275 - 289
[26] Textual emotion recognition for enhancing enterprise computing
Quan, Changqin
Ren, Fuji
ENTERPRISE INFORMATION SYSTEMS, 2016, 10 (04) : 422 - 443
[27] TEXTUAL DATA VERSUS ENCODED DATA
DECKER, DA
M D COMPUTING, 1988, 5 (06): : 4 - 5
[28] International convergence in population happiness: evidence from recent data
Ram, Rati
APPLIED ECONOMICS, 2021, 53 (34) : 3984 - 3991
[29] Income Expectations and Happiness: Evidence from British Panel Data
Ekici, Tufan
Koydemir, Selda
APPLIED RESEARCH IN QUALITY OF LIFE, 2016, 11 (02) : 539 - 552
[30] The Sources of Happiness to the Malaysians and Indonesians: Data from a Smaller Nation
Jaafar, Jas Laile
Idris, Mohd Awang
Ismuni, Jamal
Fei, Yoo
Jaafar, Salinah
Ahmad, Zahir
Ariff, Muhammad Raduan Mohd
Takwin, Bagus
Sugandi, Yogi Suprayogi
INTERNATIONAL CONGRESS ON INTERDISCIPLINARY BUSINESS AND SOCIAL SCIENCES 2012 (ICIBSOS 2012), 2012, 65 : 549 - 556

← 1 2 3 4 5 →