A vision transformer-based automated human identification using ear biometrics

被引:4
|
作者
Mehta, Ravishankar [1 ]
Shukla, Sindhuja [1 ]
Pradhan, Jitesh [1 ]
Singh, Koushlendra Kumar [1 ]
Kumar, Abhinav [2 ]
机构
[1] Natl Inst Technol Jamshedpur, Dept CSE, Machine Vis & Intelligence Lab, Jamshedpur 831014, Jharkhand, India
[2] Motilal Nehru Natl Inst Technol Allahabad, Allahabad 211001, Uttar Pradesh, India
关键词
Vision transformer; Patch; Embedding; Attention network; Data augmentation;
D O I
10.1016/j.jisa.2023.103599
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent years Vision Transformers (ViTs) have gained significant attention in the field of computer vision for their impressive performance in various tasks, including image recognition and machine translation tasks, question answering, text classification, image captioning. ViTs performs better on several benchmark image datasets such as ImageNet with fewer parameters and computation compared to CNN-based models. The self-attention part performs the feature extraction component of the convolutional neural network (CNN). The proposed model provides a framework on vision transformer-based model for 2D ear recognition. The self-attention part is jointly applied with Convolutional Neural Network (CNNs) in the proposed model. Adjustments and fine-tuning has been done based on the specific characteristics of the ear dataset and the desired performance requirements. In the field of deep learning, the application areas of the CNNs have been proven to be de-facto mainly due to its learning capability of spatially local representations based on their inductive biases, learning the global representation further enhances the recognition accuracy through self-attention mechanism of vision transformers (ViT's). This has been made possible by direct applications of transformer on to the sequence of image patches for better performance in classifying the images. The proposed work utilizes various patch size of images during the model training. From the experimental analysis, it has been observed that with patch size 16 x 16 it achieves highest accuracy of 99.36%. The proposed model has been validated with the Kaggle and IITD-II data set. The efficiency of the proposed model over the existing models has been also reported in the present work.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] AR model based human identification using ear biometrics
    [J]. 1600, Science and Engineering Research Support Society (07):
  • [2] Strawberry disease identification with vision transformer-based models
    Nguyen, Hai Thanh
    Tran, Tri Dac
    Nguyen, Thanh Tuong
    Pham, Nhi Minh
    Nguyen Ly, Phuc Hoang
    Luong, Huong Hoang
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (29) : 73101 - 73126
  • [3] Automated Multimodal Biometrics Using Face and Ear
    Luciano, Lorenzo
    Krzyzak, Adam
    [J]. IMAGE ANALYSIS AND RECOGNITION, PROCEEDINGS, 2009, 5627 : 451 - 460
  • [4] Driver Identification Using Ear Biometrics
    Kalikova, Jana
    Krcal, Jan
    [J]. 2018 5TH INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2018, : 1277 - 1281
  • [5] Automated human identification using ear imaging
    Kumar, Ajay
    Wu, Chenye
    [J]. PATTERN RECOGNITION, 2012, 45 (03) : 956 - 968
  • [6] Perspective methods of human identification: ear biometrics
    Choras, Michal
    [J]. OPTO-ELECTRONICS REVIEW, 2008, 16 (01) : 85 - 96
  • [7] Vision Transformer-Based Tailing Detection in Videos
    Lee, Jaewoo
    Lee, Sungjun
    Cho, Wonki
    Siddiqui, Zahid Ali
    Park, Unsang
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (24):
  • [8] Vision Transformer-Based Photovoltaic Prediction Model
    Kang, Zaohui
    Xue, Jizhong
    Lai, Chun Sing
    Wang, Yu
    Yuan, Haoliang
    Xu, Fangyuan
    [J]. ENERGIES, 2023, 16 (12)
  • [9] Vision Transformer-based pilot pose estimation
    Wu, Honglan
    Liu, Hao
    Sun, Youchao
    [J]. Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2024, 50 (10): : 3100 - 3110
  • [10] Transformer-based Arabic Dialect Identification
    Lin, Wanqiu
    Madhavi, Maulik
    Das, Rohan Kumar
    Li, Haizhou
    [J]. 2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 192 - 196