Comparative Analysis of Vision Transformer Models for Facial Emotion Recognition Using Augmented Balanced Datasets

被引：6

作者：

Bobojanov, Sukhrob ^{[1
]}

Kim, Byeong Man ^{[1
]}

Arabboev, Mukhriddin ^{[2
]}

Begmatov, Shohruh ^{[2
]}

机构：

[1] Kumoh Natl Inst Technol, Comp Software Engn, Gumi 39177, South Korea

[2] Tashkent Univ Informat Technol, Tashkent 10084, Uzbekistan

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 22期

关键词：

facial emotion recognition; vision transformer; data augmentation; balanced data; FER2013; RAF-DB;

D O I：

10.3390/app132212271

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Facial emotion recognition (FER) has a huge importance in the field of human-machine interface. Given the intricacies of human facial expressions and the inherent variations in images, which are characterized by diverse facial poses and lighting conditions, the task of FER remains a challenging endeavour for computer-based models. Recent advancements have seen vision transformer (ViT) models attain state-of-the-art results across various computer vision tasks, encompassing image classification, object detection, and segmentation. Moreover, one of the most important aspects of creating strong machine learning models is correcting data imbalances. To avoid biased predictions and guarantee reliable findings, it is essential to maintain the distribution equilibrium of the training dataset. In this work, we have chosen two widely used open-source datasets, RAF-DB and FER2013. As well as resolving the imbalance problem, we present a new, balanced dataset, applying data augmentation techniques and cleaning poor-quality images from the FER2013 dataset. We then conduct a comprehensive evaluation of thirteen different ViT models with these three datasets. Our investigation concludes that ViT models present a promising approach for FER tasks. Among these ViT models, Mobile ViT and Tokens-to-Token ViT models appear to be the most effective, followed by PiT and Cross Former models.

引用

页数：14

共 50 条

[31] Facial Emotion Recognition using Active Shape Models and Statistical Pattern Recognizers
Jang, Gil-Jin
Park, Jeong-Sik
Jo, Ahra
Kim, Ji-Hwan
2014 NINTH INTERNATIONAL CONFERENCE ON BROADBAND AND WIRELESS COMPUTING, COMMUNICATION AND APPLICATIONS (BWCCA), 2014, : 514 - 517
[32] Emotion Recognition Using Hidden Markov Models from Facial Temperature Sequence
Liu, Zhilei
Wang, Shangfei
AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PT II, 2011, 6975 : 240 - 247
[33] Speech Emotion Recognition in Multimodal Environments with Transformer: Arabic and English Audio Datasets
Mohamed, Esraa A.
Koura, Abdelrahim
Kayed, Mohammed
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (03) : 581 - 592
[34] Facial recognition techniques using SVM: A comparative analysis
Cadena Moreano, Jose Augusto
La Serna Palomino, Nora
Llano Casa, Alex Christian
ENFOQUE UTE, 2019, 10 (03): : 98 - 111
[35] Comparative analysis of physiological signals and Electroencephalogram (EEG) for multimodal emotion recognition using generative models
Torres-Valencia, Cristian A.
Garcia-Arias, Hernan F.
Alvarez Lopez, Mauricio A.
Orozco-Gutierrez, Alvaro A.
2014 XIX SYMPOSIUM ON IMAGE, SIGNAL PROCESSING AND ARTIFICIAL VISION (STSIVA), 2014,
[36] Bi-Branch Vision Transformer Network for EEG Emotion Recognition
Lu, Wei
Tan, Tien-Ping
Ma, Hua
IEEE ACCESS, 2023, 11 : 36233 - 36243
[37] RFID-transformer recognition system (RTRS): enhancing privacy in facial recognition with transformer models
Wang, Zeyuan
Xu, He
Zhang, Manman
Cai, Zhaorui
Chen, Yongyuan
SENSOR REVIEW, 2025, 45 (01) : 17 - 30
[38] Facial Emotion Recognition using Neighborhood Features
Aljoloud, Abdulaziz Salamah
Ullah, Habib
Alanazi, Adwan Alownie
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 299 - 306
[39] Facial Emotion Recognition Using Fuzzy Systems
Nicolai, Austin
Choi, Anthony
2015 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2015): BIG DATA ANALYTICS FOR HUMAN-CENTRIC SYSTEMS, 2015, : 2216 - 2221
[40] Emotion Recognition using Facial and Audio features
Krishna, Tarun
Rai, Ayush
Bansal, Shubham
Khandelwal, Shubham
Gupta, Shubham
Goyal, Dushyant
ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 557 - 562

← 1 2 3 4 5 →