How Machine Learning Classification Accuracy Changes in a Happiness Dataset with Different Demographic Groups

被引:5
|
作者
Sweeney, Colm [1 ]
Ennis, Edel [1 ]
Mulvenna, Maurice [2 ]
Bond, Raymond [2 ]
O'Neill, Siobhan [1 ]
机构
[1] Ulster Univ, Sch Psychol, Coleraine BT52 1SA, Londonderry, North Ireland
[2] Ulster Univ, Sch Comp, Jordanstown BT37 0QB, North Ireland
关键词
machine learning; classification; positive psychology; GENDER;
D O I
10.3390/computers11050083
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This study aims to explore how machine learning classification accuracy changes with different demographic groups. The HappyDB is a dataset that contains over 100,000 happy statements, incorporating demographic information that includes marital status, gender, age, and parenthood status. Using the happiness category field, we test different types of machine learning classifiers to predict what category of happiness the statements belong to, for example, whether they indicate happiness relating to achievement or affection. The tests were initially conducted with three distinct classifiers and the best performing model was the convolutional neural network (CNN) model, which is a deep learning algorithm, achieving an F1 score of 0.897 when used with the complete dataset. This model was then used as the main classifier to further analyze the results and to establish any variety in performance when tested on different demographic groups. We analyzed the results to see if classification accuracy was improved for different demographic groups, and found that the accuracy of prediction within this dataset declined with age, with the exception of the single parent subgroup. The results also showed improved performance for the married and parent subgroups, and lower performances for the non-parent and un-married subgroups, even when investigating a balanced sample.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Understanding a happiness dataset: How the machine learning classification accuracy changes with different demographic groups
    Sweeney, Colm
    Ennis, Edel
    Bond, Raymond
    Mulvenna, Maurice D.
    O'Neill, Siobhan
    26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,
  • [2] Improving Machine Learning Classification Accuracy for Breathing Abnormalities by Enhancing Dataset
    Rehman, Mubashir
    Shah, Raza Ali
    Khan, Muhammad Bilal
    Shah, Syed Aziz
    AbuAli, Najah Abed
    Yang, Xiaodong
    Alomainy, Akram
    Imran, Muhmmad Ali
    Abbasi, Qammer H.
    SENSORS, 2021, 21 (20)
  • [3] Improving the accuracy of multiclass classification in machine learning: A case study in a cell signaling dataset
    Pablo Gonzalez-Perez, Pedro
    Eduardo Sanchez-Gutierrez, Maximo
    INTELLIGENT DATA ANALYSIS, 2022, 26 (02) : 481 - 500
  • [4] Accuracy of Different Machine Learning Type Methodologies for EEG Classification by Diagnosis
    Misiunas, Andrius Vytautas Misiukas
    Meskauskas, Tadas
    Samaitiene, Ruta
    NUMERICAL METHODS AND APPLICATIONS, NMA 2018, 2019, 11189 : 441 - 448
  • [5] Handling Imbalanced Dataset Classification in Machine Learning
    Yadav, Seema
    Bhole, Girish P.
    2020 IEEE PUNE SECTION INTERNATIONAL CONFERENCE (PUNECON), 2020, : 38 - 43
  • [6] Machine learning for Gravity Spy: Glitch classification and dataset
    Bahaadini, S.
    Noroozi, V.
    Rohani, N.
    Coughlin, S.
    Zevin, M.
    Smith, J. R.
    Kalogera, V.
    Katsaggelos, A.
    INFORMATION SCIENCES, 2018, 444 : 172 - 186
  • [7] Sugarcane leaf dataset: A dataset for disease detection and classification for machine learning applications
    Thite, Sandip
    Suryawanshi, Yogesh
    Patil, Kailas
    Chumchu, Prawit
    DATA IN BRIEF, 2024, 53
  • [8] Accuracy Comparison of Different Batch Size for a Supervised Machine Learning Task with Image Classification
    Aldin, Noor Baha
    Aldin, Shaima Safa Aldin Baha
    2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 316 - 319
  • [9] Comparison of Different Machine Learning Methods on Wisconsin Dataset
    Ivancakova, Juliana
    Babic, Frantisek
    Butka, Peter
    2018 IEEE 16TH WORLD SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS (SAMI 2018): DEDICATED TO THE MEMORY OF PIONEER OF ROBOTICS ANTAL (TONY) K. BEJCZY, 2018, : 173 - 177
  • [10] Classification of Intrusion Detection Dataset using machine learning Approaches
    Subramanyam, Doodipalli
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL TECHNIQUES, ELECTRONICS AND MECHANICAL SYSTEMS (CTEMS), 2018, : 280 - 283