How Machine Learning Classification Accuracy Changes in a Happiness Dataset with Different Demographic Groups

被引：5

作者：

Sweeney, Colm ^{[1
]}

Ennis, Edel ^{[1
]}

Mulvenna, Maurice ^{[2
]}

Bond, Raymond ^{[2
]}

O'Neill, Siobhan ^{[1
]}

机构：

[1] Ulster Univ, Sch Psychol, Coleraine BT52 1SA, Londonderry, North Ireland

[2] Ulster Univ, Sch Comp, Jordanstown BT37 0QB, North Ireland

来源：

COMPUTERS | 2022年 / 11卷 / 05期

关键词：

machine learning; classification; positive psychology; GENDER;

D O I：

10.3390/computers11050083

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This study aims to explore how machine learning classification accuracy changes with different demographic groups. The HappyDB is a dataset that contains over 100,000 happy statements, incorporating demographic information that includes marital status, gender, age, and parenthood status. Using the happiness category field, we test different types of machine learning classifiers to predict what category of happiness the statements belong to, for example, whether they indicate happiness relating to achievement or affection. The tests were initially conducted with three distinct classifiers and the best performing model was the convolutional neural network (CNN) model, which is a deep learning algorithm, achieving an F1 score of 0.897 when used with the complete dataset. This model was then used as the main classifier to further analyze the results and to establish any variety in performance when tested on different demographic groups. We analyzed the results to see if classification accuracy was improved for different demographic groups, and found that the accuracy of prediction within this dataset declined with age, with the exception of the single parent subgroup. The results also showed improved performance for the married and parent subgroups, and lower performances for the non-parent and un-married subgroups, even when investigating a balanced sample.

引用

页数：15

共 50 条

[1] Understanding a happiness dataset: How the machine learning classification accuracy changes with different demographic groups
Sweeney, Colm
Ennis, Edel
Bond, Raymond
Mulvenna, Maurice D.
O'Neill, Siobhan
26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,
[2] Improving Machine Learning Classification Accuracy for Breathing Abnormalities by Enhancing Dataset
Rehman, Mubashir
Shah, Raza Ali
Khan, Muhammad Bilal
Shah, Syed Aziz
AbuAli, Najah Abed
Yang, Xiaodong
Alomainy, Akram
Imran, Muhmmad Ali
Abbasi, Qammer H.
SENSORS, 2021, 21 (20)
[3] Improving the accuracy of multiclass classification in machine learning: A case study in a cell signaling dataset
Pablo Gonzalez-Perez, Pedro
Eduardo Sanchez-Gutierrez, Maximo
INTELLIGENT DATA ANALYSIS, 2022, 26 (02) : 481 - 500
[4] Accuracy of Different Machine Learning Type Methodologies for EEG Classification by Diagnosis
Misiunas, Andrius Vytautas Misiukas
Meskauskas, Tadas
Samaitiene, Ruta
NUMERICAL METHODS AND APPLICATIONS, NMA 2018, 2019, 11189 : 441 - 448
[5] Handling Imbalanced Dataset Classification in Machine Learning
Yadav, Seema
Bhole, Girish P.
2020 IEEE PUNE SECTION INTERNATIONAL CONFERENCE (PUNECON), 2020, : 38 - 43
[6] Machine learning for Gravity Spy: Glitch classification and dataset
Bahaadini, S.
Noroozi, V.
Rohani, N.
Coughlin, S.
Zevin, M.
Smith, J. R.
Kalogera, V.
Katsaggelos, A.
INFORMATION SCIENCES, 2018, 444 : 172 - 186
[7] Sugarcane leaf dataset: A dataset for disease detection and classification for machine learning applications
Thite, Sandip
Suryawanshi, Yogesh
Patil, Kailas
Chumchu, Prawit
DATA IN BRIEF, 2024, 53
[8] Accuracy Comparison of Different Batch Size for a Supervised Machine Learning Task with Image Classification
Aldin, Noor Baha
Aldin, Shaima Safa Aldin Baha
2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 316 - 319
[9] Comparison of Different Machine Learning Methods on Wisconsin Dataset
Ivancakova, Juliana
Babic, Frantisek
Butka, Peter
2018 IEEE 16TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2018): DEDICATED TO THE MEMORY OF PIONEER OF ROBOTICS ANTAL (TONY) K. BEJCZY, 2018, : 173 - 177
[10] Classification of Intrusion Detection Dataset using machine learning Approaches
Subramanyam, Doodipalli
PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL TECHNIQUES, ELECTRONICS AND MECHANICAL SYSTEMS (CTEMS), 2018, : 280 - 283

← 1 2 3 4 5 →