ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

被引:3
|
作者
Abbas, Farhat [1 ]
Yasmin, Mussarat [1 ]
Fayyaz, Muhammad [2 ]
Asim, Usman [3 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Wah Campus, WahCantt 47040, Pakistan
[2] FAST Natl Univ Comp & Emerging Sci NUCES, Dept Comp Sci, Chiniot Faisalabad Campus, Chiniot, Punjab, Pakistan
[3] DeltaX, 3F,24,Namdaemun Ro 9 Gil, Seoul, South Korea
关键词
Vision transformer; LSA and SPT; Deep CNN models; SS datasets; Pedestrian gender classification; CONVOLUTIONAL NEURAL-NETWORK; RECOGNITION;
D O I
10.1007/s10044-023-01196-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications like content-based image retrieval, visual surveillance, smart city, and demographic collection. In the last decade, convolutional neural networks (CNN) have appeared with great potential and with reliable choices for vision tasks, such as object classification, recognition, detection, etc. But CNN has a limited local receptive field that prevents them from learning information about the global context. In contrast, a vision transformer (ViT) is a better alternative to CNN because it utilizes a self-attention mechanism to attend to a different patch of an input image. In this work, generic and effective modules such as locality self-attention (LSA), and shifted patch tokenization (SPT)-based vision transformer model are explored for the PGC task. With the use of these modules in ViT, it is successfully able to learn from stretch even on small-size (SS) datasets and overcome the lack of locality inductive bias. Through extensive experimentation, we found that the proposed ViT model produced better results in terms of overall and mean accuracies. The better results confirm that ViT outperformed state-of-the-art (SOTA) PGC methods.
引用
收藏
页码:1805 / 1819
页数:15
相关论文
共 50 条
  • [21] AnisotropicBreast-ViT: Breast Cancer Classification in Ultrasound Images Using Anisotropic Filtering and Vision Transformer
    Diniz, Joao Otavio Bandeira
    Ribeiro, Neilson P.
    Dias, Domingos A., Jr.
    da Cruz, Luana B.
    da Silva, Giovanni L. F.
    Gomes, Daniel L., Jr.
    de Paiva, Anselmo C.
    Silva, Aristofanes C.
    INTELLIGENT SYSTEMS, BRACIS 2024, PT III, 2025, 15414 : 95 - 109
  • [22] ViT-DualAtt: An efficient pornographic image classification method based on Vision Transformer with dual attention
    Cai, Zengyu
    Xu, Liusen
    Zhang, Jianwei
    Feng, Yuan
    Zhu, Liang
    Liu, Fangmei
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (12): : 6698 - 6716
  • [23] SI-ViT: Shuffle instance-based Vision Transformer for pancreatic cancer ROSE image classification
    Zhang, Tianyi
    Feng, Youdan
    Zhao, Yu
    Lei, Yanli
    Ying, Nan
    Song, Fan
    He, Yufang
    Yan, Zhiling
    Feng, Yunlu
    Yang, Aiming
    Zhang, Guanglei
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2024, 244
  • [24] Effects of dataset curation on body condition score (BCS) determination with a vision transformer (ViT) applied to RGB+depth images
    Winkler, Zachary
    Boucheron, Laura E.
    Utsumi, Santiago
    Nyamuryekung'e, Shelemia
    Mcintosh, Matthew
    Estell, Richard E.
    SMART AGRICULTURAL TECHNOLOGY, 2024, 8
  • [25] White Blood Cell Classification: Convolutional Neural Network (CNN) and Vision Transformer (ViT) under Medical Microscope
    Abou Ali, Mohamad
    Dornaika, Fadi
    Arganda-Carreras, Ignacio
    ALGORITHMS, 2023, 16 (11)
  • [26] ViT-P: Classification of Genitourinary Syndrome of Menopause From OCT Images Based on Vision Transformer Models
    Wang, Haoran
    Ji, Yanju
    Song, Kaiwen
    Sun, Mingyang
    Lv, Peitong
    Zhang, Tianyu
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2021, 70
  • [27] Enhancing Fashion Classification with Vision Transformer (ViT) and Developing Recommendation Fashion Systems Using DINOVA2
    Abd Alaziz, Hadeer M.
    Elmannai, Hela
    Saleh, Hager
    Hadjouni, Myriam
    Anter, Ahmed M.
    Koura, Abdelrahim
    Kayed, Mohammed
    ELECTRONICS, 2023, 12 (20)
  • [28] GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model
    Venkatasaichandrakanth, P.
    Iyapparaja, M.
    PLOS ONE, 2024, 19 (03):
  • [29] Patient teacher can impart locality to improve lightweight vision transformer on small dataset
    Ling, Jun
    Zhang, Xuan
    Du, Fei
    Li, Linyu
    Shang, Weiyi
    Gao, Chen
    Li, Tong
    PATTERN RECOGNITION, 2025, 157
  • [30] The DeepFish computer vision dataset for fish instance segmentation, classification, and size estimation
    Garcia-d'Urso, Nahuel
    Galan-Cuenca, Alejandro
    Perez-Sanchez, Paula
    Climent-Perez, Pau
    Fuster-Guillo, Andres
    Azorin-Lopez, Jorge
    Saval-Calvo, Marcelo
    Guillen-Nieto, Juan Eduardo
    Soler-Capdepon, Gabriel
    SCIENTIFIC DATA, 2022, 9 (01)