Computing Happiness from Textual Data

被引:3
|
作者
Mohamed, Emad [1 ]
Mostafa, Sayed A. [2 ]
机构
[1] Univ Wolverhampton, Res Grp Computat Linguist, Wolverhampton WV1 1LY, England
[2] North Carolina A&T State Univ, Dept Math & Stat, Greensboro, NC 27411 USA
来源
STATS | 2019年 / 2卷 / 03期
关键词
fastText; gradient boosting; happiness; lemmatization; lexical analysis; logistic regression; parsing; topic modeling;
D O I
10.3390/stats2030025
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this paper, we use a corpus of about 100,000 happy moments written by people of different genders, marital statuses, parenthood statuses, and ages to explore the following questions: Are there differences between men and women, married and unmarried individuals, parents and non-parents, and people of different age groups in terms of their causes of happiness and how they express happiness? Can gender, marital status, parenthood status and/or age be predicted from textual data expressing happiness? The first question is tackled in two steps: first, we transform the happy moments into a set of topics, lemmas, part of speech sequences, and dependency relations; then, we use each set as predictors in multi-variable binary and multinomial logistic regressions to rank these predictors in terms of their influence on each outcome variable (gender, marital status, parenthood status and age). For the prediction task, we use character, lexical, grammatical, semantic, and syntactic features in a machine learning document classification approach. The classification algorithms used include logistic regression, gradient boosting, and fastText. Our results show that textual data expressing moments of happiness can be quite beneficial in understanding the "causes of happiness" for different social groups, and that social characteristics like gender, marital status, parenthood status, and, to some extent age, can be successfully predicted form such textual data. This research aims to bring together elements from philosophy and psychology to be examined by computational corpus linguistics methods in a way that promotes the use of Natural Language Processing for the Humanities.
引用
收藏
页码:347 / 370
页数:24
相关论文
共 50 条
  • [41] Transductive Learning from Textual Data with Relevant Example Selection
    Ceci, Michelangelo
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT 2, 2010, 6262 : 470 - 484
  • [42] Happiness from social capital: An investigation from micro data in rural Thailand
    Rukumnuaykit, Pungpond
    Pholphirul, Piriya
    COMMUNITY DEVELOPMENT, 2016, 47 (04) : 562 - 573
  • [43] Dataset for Siswati: Parallel textual data for English and Siswati and monolingual textual data for Siswati
    Gaustad, Tanja
    McKellar, Cindy A.
    Puttkammer, Martin J.
    DATA IN BRIEF, 2024, 54
  • [44] CRITICAL-THEORY AND TEXTUAL COMPUTING - COMMENTS AND SUGGESTIONS
    OLSEN, M
    COMPUTERS AND THE HUMANITIES, 1993, 27 (5-6): : 395 - 400
  • [45] Behaviors of Reservoir Computing Models for Textual Documents Classification
    Schaetti, Nils
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [46] Fog computing: from architecture to edge computing and big data processing
    Singh, Simar Preet
    Nayyar, Anand
    Kumar, Rajesh
    Sharma, Anju
    JOURNAL OF SUPERCOMPUTING, 2019, 75 (04): : 2070 - 2105
  • [47] Fog computing: from architecture to edge computing and big data processing
    Simar Preet Singh
    Anand Nayyar
    Rajesh Kumar
    Anju Sharma
    The Journal of Supercomputing, 2019, 75 : 2070 - 2105
  • [48] Websom for Textual Data Mining
    Krista Lagus
    Timo Honkela
    Samuel Kaski
    Teuvo Kohonen
    Artificial Intelligence Review, 1999, 13 : 345 - 364
  • [49] A DATA MODEL FOR USE WITH FORMATTED AND TEXTUAL DATA
    DESAI, BC
    GOYAL, P
    SADRI, F
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1986, 37 (03): : 158 - 165
  • [50] Textual data science with R
    Sanchez, Brisa N.
    BIOMETRICS, 2019, 75 (04) : 1415 - 1416