A vision transformer-based automated human identification using ear biometrics

被引:6
|
作者
Mehta, Ravishankar [1 ]
Shukla, Sindhuja [1 ]
Pradhan, Jitesh [1 ]
Singh, Koushlendra Kumar [1 ]
Kumar, Abhinav [2 ]
机构
[1] Natl Inst Technol Jamshedpur, Dept CSE, Machine Vis & Intelligence Lab, Jamshedpur 831014, Jharkhand, India
[2] Motilal Nehru Natl Inst Technol Allahabad, Allahabad 211001, Uttar Pradesh, India
关键词
Vision transformer; Patch; Embedding; Attention network; Data augmentation;
D O I
10.1016/j.jisa.2023.103599
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent years Vision Transformers (ViTs) have gained significant attention in the field of computer vision for their impressive performance in various tasks, including image recognition and machine translation tasks, question answering, text classification, image captioning. ViTs performs better on several benchmark image datasets such as ImageNet with fewer parameters and computation compared to CNN-based models. The self-attention part performs the feature extraction component of the convolutional neural network (CNN). The proposed model provides a framework on vision transformer-based model for 2D ear recognition. The self-attention part is jointly applied with Convolutional Neural Network (CNNs) in the proposed model. Adjustments and fine-tuning has been done based on the specific characteristics of the ear dataset and the desired performance requirements. In the field of deep learning, the application areas of the CNNs have been proven to be de-facto mainly due to its learning capability of spatially local representations based on their inductive biases, learning the global representation further enhances the recognition accuracy through self-attention mechanism of vision transformers (ViT's). This has been made possible by direct applications of transformer on to the sequence of image patches for better performance in classifying the images. The proposed work utilizes various patch size of images during the model training. From the experimental analysis, it has been observed that with patch size 16 x 16 it achieves highest accuracy of 99.36%. The proposed model has been validated with the Kaggle and IITD-II data set. The efficiency of the proposed model over the existing models has been also reported in the present work.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] A deep learning approach for person identification using ear biometrics
    Ahila Priyadharshini, Ramar
    Arivazhagan, Selvaraj
    Arun, Madakannu
    APPLIED INTELLIGENCE, 2021, 51 (04) : 2161 - 2172
  • [32] TransReID: Transformer-based Object Re-Identification
    He, Shuting
    Luo, Hao
    Wang, Pichao
    Wang, Fan
    Li, Hao
    Jiang, Wei
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14993 - 15002
  • [33] A Swin Transformer-based model for mosquito species identification
    Zhao, De-zhong
    Wang, Xin-kai
    Zhao, Teng
    Li, Hu
    Xing, Dan
    Gao, He-ting
    Song, Fan
    Chen, Guo-hua
    Li, Chun-xiao
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [34] A Swin Transformer-based model for mosquito species identification
    De-zhong Zhao
    Xin-kai Wang
    Teng Zhao
    Hu Li
    Dan Xing
    He-ting Gao
    Fan Song
    Guo-hua Chen
    Chun-xiao Li
    Scientific Reports, 12
  • [35] Investigating transformer-based models for automated e-governance in Indian Railway using Twitter
    Agarwal, Swati
    Kumar, Ashrut
    Ganguly, Rijul
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 4551 - 4577
  • [36] Automated Waste Management using a Customized Vision-based Transformer Model
    Kashef, Rasha
    Alegre, Edwin P.
    Prova, Tasnim
    Aggarwal, Sakshi
    2024 IEEE 5TH ANNUAL WORLD AI IOT CONGRESS, AIIOT 2024, 2024, : 0300 - 0309
  • [37] Predictive stroke risk model with vision transformer-based Doppler features
    Lo, Chung-Ming
    Hung, Peng-Hsiang
    MEDICAL PHYSICS, 2024, 51 (01) : 126 - 138
  • [38] StainSWIN: Vision transformer-based stain normalization for histopathology image analysis
    Kablan, Elif Baykal
    Ayas, Selen
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [39] Investigating transformer-based models for automated e-governance in Indian Railway using Twitter
    Swati Agarwal
    Ashrut Kumar
    Rijul Ganguly
    Multimedia Tools and Applications, 2024, 83 : 4551 - 4577
  • [40] Video captioning using transformer-based GAN
    Babavalian M.R.
    Kiani K.
    Multimedia Tools and Applications, 2025, 84 (10) : 7091 - 7113