ViTMa: A Novel Hybrid Vision Transformer and Mamba for Kinship Recognition in Indonesian Facial Micro-Expressions

被引:0
|
作者
Fibriani, Ike [1 ,2 ]
Yuniarno, Eko Mulyanto [1 ,3 ]
Mardiyanto, Ronny [1 ]
Purnomo, Mauridhi Hery [1 ,3 ,4 ]
机构
[1] Sepuluh Nopember Inst Technol, Dept Elect Engn, Surabaya 60111, Indonesia
[2] Univ Jember, Dept Elect Engn, Jember 68121, Indonesia
[3] Sepuluh Nopember Inst Technol, Dept Comp Engn, Surabaya 60111, Indonesia
[4] Univ Ctr Excellence Artificial Intelligence Health, Surabaya 60111, Indonesia
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Feature extraction; Face recognition; Visualization; Image recognition; Accuracy; Transformers; Marine vehicles; Faces; Computer architecture; Videos; Vision transformers; Mamba; siamese neural network; feature fusion; kinship recognition; micro-expressions;
D O I
10.1109/ACCESS.2024.3487180
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Kinship recognition that primarily focuses on exploiting facial micro expressions is an interesting and challenging problem that aims to determine whether multiple individuals belong to the same family. Previous approaches have been limited by model capacity and insufficient training data, resulting in low-level features and shallow model learning. These common manual features cannot capture information effectively, leading to suboptimal accuracy. In this paper, we propose a kinship recognition that exploits facial micro expressions using a hybrid Vision Transformer and Mamba (ViTMa) model with modified Deep Feature Fusion, which combines different backbone architectures and feature fusion strategies. The ViTMa model is pre-trained on a large dataset and adapted to Indonesian facial images. The Siamese architecture processes two input images, extracts features fused with feature fusion, and passes them to a classification network. Experiments on the FIW-Local Indonesia dataset demonstrate the effectiveness of this method, with the best model using B16 quadratic features and multiplicative fusion achieving an average accuracy of 85.18% across all kinship categories, outperforming previous approaches. We found that B16, despite being the smallest backbone, has the best performance compared to larger backbones such as L16 with an average accuracy of 67.99%, B32 with an average accuracy of 72.98%, and L32 with an average accuracy of 71.69%. Thus, the ViTMa model with our proposed B16 quadratic feature fusion and multiplicative fusion strategy achieves the best performance and achieves better accuracy outperforming previous studies.
引用
收藏
页码:164002 / 164017
页数:16
相关论文
共 15 条
  • [1] Facial micro-expressions as a soft biometric for person recognition
    Saeed, Usman
    PATTERN RECOGNITION LETTERS, 2021, 143 : 95 - 103
  • [2] Effective recognition of facial micro-expressions with video motion magnification
    Yandan Wang
    John See
    Yee-Hui Oh
    Raphael C.-W. Phan
    Yogachandran Rahulamathavan
    Huo-Chong Ling
    Su-Wei Tan
    Xujie Li
    Multimedia Tools and Applications, 2017, 76 : 21665 - 21690
  • [3] Effective recognition of facial micro-expressions with video motion magnification
    Wang, Yandan
    See, John
    Oh, Yee-Hui
    Phan, Raphael C-W
    Rahulamathavan, Yogachandran
    Ling, Huo-Chong
    Tan, Su-Wei
    Li, Xujie
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (20) : 21665 - 21690
  • [4] Using Facial Micro-Expressions in Combination With EEG and Physiological Signals for Emotion Recognition
    Saffaryazdi, Nastaran
    Wasim, Syed Talal
    Dileep, Kuldeep
    Nia, Alireza Farrokhi
    Nanayakkara, Suranga
    Broadbent, Elizabeth
    Billinghurst, Mark
    FRONTIERS IN PSYCHOLOGY, 2022, 13
  • [5] Spatiotemporal Features Fusion From Local Facial Regions for Micro-Expressions Recognition
    Aouayeb, Mouath
    Soladie, Catherine
    Hamidouche, Wassim
    Kpalma, Kidiyo
    Seguier, Renaud
    FRONTIERS IN SIGNAL PROCESSING, 2022, 2
  • [6] Multimodal Decomposition with Magnification on Micro-expressions and its Impact on Facial Biometric Recognition
    Lee, Zun-Ci
    Phan, Raphael C. -W.
    Tan, Su-Wei
    Lee, Kuan-Heng
    2017 IEEE INTERNATIONAL SYMPOSIUM ON CONSUMER ELECTRONICS (ISCE), 2017, : 45 - 46
  • [7] A SPATIOTEMPORAL DEEP LEARNING SOLUTION FOR AUTOMATIC MICRO-EXPRESSIONS RECOGNITION FROM LOCAL FACIAL REGIONS
    Aouayeb, Mouath
    Hamidouche, Wassim
    Kpalma, Kidiyo
    Benazza-Benyahia, Amel
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [8] Physiology-based Recognition of Facial Micro-expressions using EEG and Identification of the Relevant Sensors by Emotion
    Benlamine, Mohamed S.
    Chaouachi, Maher
    Frasson, Claude
    Dufresne, Aude
    PHYSICS: PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON PHYSIOLOGICAL COMPUTING SYSTEMS, 2016, : 130 - 137
  • [9] Facial Expression Recognition Based on Vision Transformer with Hybrid Local Attention
    Tian, Yuan
    Zhu, Jingxuan
    Yao, Huang
    Chen, Di
    APPLIED SCIENCES-BASEL, 2024, 14 (15):
  • [10] Facial Micro-Expression Recognition Enhanced by Score Fusion and a Hybrid Model from Convolutional LSTM and Vision Transformer
    Zheng, Yufeng
    Blasch, Erik
    SENSORS, 2023, 23 (12)