Research on facial recognition of sika deer based on vision transformer

被引:3
|
作者
Gong, He [1 ,2 ,3 ,4 ]
Luo, Tianye [1 ]
Ni, Lingyun [1 ]
Li, Ji [1 ]
Guo, Jie [1 ]
Liu, Tonghe [1 ]
Feng, Ruilong [1 ]
Mu, Ye [1 ,2 ,3 ,4 ]
Hu, Tianli [1 ,2 ,3 ,4 ]
Sun, Yu [1 ,2 ,3 ,4 ]
Guo, Ying [1 ,2 ,3 ,4 ]
Li, Shijun [5 ,6 ]
机构
[1] Jilin Agr Univ, Coll Informat Technol, Changchun 130118, Peoples R China
[2] Jilin Prov Agr Internet Things Technol Collaborat, Changchun 130118, Peoples R China
[3] Jilin Prov Intelligent Environm Engn Res Ctr, Changchun 130118, Peoples R China
[4] Jilin Prov Coll & Univ 13 Five Year Engn Res Ctr, Changchun 130118, Peoples R China
[5] Wuzhou Univ, Coll Informat Technol, Wuzhou 543003, Peoples R China
[6] Guangxi Key Lab Machine Vis & Intelligent Control, Wuzhou 543003, Peoples R China
关键词
Sika deer; Vision transformer; DenseNet; Face recognition; Patch flattening; FACE RECOGNITION;
D O I
10.1016/j.ecoinf.2023.102334
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
In the face of global concerns about endangered ecosystems, it is vital to identify individual animals. Along these lines, in this work, a Vision Transformer (ViT) based model for sika deer individual recognition using facial data was designed. To get the satisfactory results, both low-level aspects like texture and color must also be considered, in addition to the high-level semantic information. Consequently, it was difficult to get good results by only applying advanced retrieval features. The standard ViT or ViT with ResNet (Residual neural network) as the backbone network may not be the best solution, as the direct patch flattening method of feature embedded in the conventional ViT is not applicable for performing deer face recognition. Therefore, DenseNet (Densely connected convolutional networks) block as Module 1 was used for extracting low-level features. DenseNet layers enable feature reuse through dense connections, and any layer can communicate directly. Thus maximum exchange of information flow between layers in the network is enabled. In Module 2, the mask approach was also used to eliminate extraneous information from the images and reduce interference from complicated backgrounds on the identification accuracy. In addition, the pixel multiplication of the feature map output from the two modules enabled the fusion of the local features with global features, enriching hence the expressiveness of the feature map. Finally, the ViT structure was run through pre-trained. The experimental results showed that the proposed model can reach an accuracy of 97.68% for identifying sika deer individuals and exhibited excellent generalization capabilities. A valid database for the individual identification of sika deer is provided by our work, significantly contributing to the conservation and promotion of the ecosystem.
引用
收藏
页数:14
相关论文
共 50 条
  • [21] AUTOMATIC RECOGNITION OF FACIAL EXPRESSION BASED ON COMPUTER VISION
    Zhu, Shaoping
    INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2015, 8 (03): : 1464 - 1483
  • [22] PIDViT: Pose-Invariant Distilled Vision Transformer for Facial Expression Recognition in the Wild
    Huang, Yin-Fu
    Tsai, Chia-Hsin
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (04) : 3281 - 3293
  • [23] A high-precision facial recognition method for small-tailed Han sheep based on an optimised Vision Transformer
    Zhang, Xiwen
    Xuan, Chuanzhong
    Ma, Yanhua
    Su, He
    ANIMAL, 2023, 17 (08)
  • [24] Static hand gesture recognition method based on the Vision Transformer
    Yu Zhang
    Junlin Wang
    Xin Wang
    Haonan Jing
    Zhanshuo Sun
    Yu Cai
    Multimedia Tools and Applications, 2023, 82 : 31309 - 31328
  • [25] Recognition of Basketball Tactics Based on Vision Transformer and Track Filter
    Xu G.
    Shen G.
    Liang X.
    Luo J.
    Dianzi Yu Xinxi Xuebao/Journal of Electronics and Information Technology, 2024, 46 (02): : 615 - 623
  • [26] SLRFormer: Continuous Sign Language Recognition Based on Vision Transformer
    Xiao, Feng
    Liu, Ruyu
    Yuan, Tiantian
    Fan, Zhimin
    Wang, Jiajia
    Zhang, Jianhua
    2022 10TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2022,
  • [27] Static hand gesture recognition method based on the Vision Transformer
    Zhang, Yu
    Wang, Junlin
    Wang, Xin
    Jing, Haonan
    Sun, Zhanshuo
    Cai, Yu
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (20) : 31309 - 31328
  • [28] Vision Transformer-based recognition of diabetic retinopathy grade
    Wu, Jianfang
    Hu, Ruo
    Xiao, Zhenghong
    Chen, Jiaxu
    Liu, Jingwei
    MEDICAL PHYSICS, 2021, 48 (12) : 7850 - 7863
  • [29] Comparative Analysis of Vision Transformer Models for Facial Emotion Recognition Using Augmented Balanced Datasets
    Bobojanov, Sukhrob
    Kim, Byeong Man
    Arabboev, Mukhriddin
    Begmatov, Shohruh
    APPLIED SCIENCES-BASEL, 2023, 13 (22):
  • [30] WR-Former: Vision Transformer with Weight Reallocation Module for Robust Facial Expression Recognition
    Wang, Rui
    Wang, Haitao
    Zou, Peng
    Sun, Xiao
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1207 - 1212