ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

被引:3
|
作者
Abbas, Farhat [1 ]
Yasmin, Mussarat [1 ]
Fayyaz, Muhammad [2 ]
Asim, Usman [3 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Wah Campus, WahCantt 47040, Pakistan
[2] FAST Natl Univ Comp & Emerging Sci NUCES, Dept Comp Sci, Chiniot Faisalabad Campus, Chiniot, Punjab, Pakistan
[3] DeltaX, 3F,24,Namdaemun Ro 9 Gil, Seoul, South Korea
关键词
Vision transformer; LSA and SPT; Deep CNN models; SS datasets; Pedestrian gender classification; CONVOLUTIONAL NEURAL-NETWORK; RECOGNITION;
D O I
10.1007/s10044-023-01196-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications like content-based image retrieval, visual surveillance, smart city, and demographic collection. In the last decade, convolutional neural networks (CNN) have appeared with great potential and with reliable choices for vision tasks, such as object classification, recognition, detection, etc. But CNN has a limited local receptive field that prevents them from learning information about the global context. In contrast, a vision transformer (ViT) is a better alternative to CNN because it utilizes a self-attention mechanism to attend to a different patch of an input image. In this work, generic and effective modules such as locality self-attention (LSA), and shifted patch tokenization (SPT)-based vision transformer model are explored for the PGC task. With the use of these modules in ViT, it is successfully able to learn from stretch even on small-size (SS) datasets and overcome the lack of locality inductive bias. Through extensive experimentation, we found that the proposed ViT model produced better results in terms of overall and mean accuracies. The better results confirm that ViT outperformed state-of-the-art (SOTA) PGC methods.
引用
收藏
页码:1805 / 1819
页数:15
相关论文
共 50 条
  • [31] Hardware Design of Lightweight Binary Classification Algorithms for Small-Size Images on FPGA
    Saglam, Serkan
    Bayar, Salih
    IEEE ACCESS, 2024, 12 : 57225 - 57235
  • [32] The DeepFish computer vision dataset for fish instance segmentation, classification, and size estimation
    Nahuel Garcia-d’Urso
    Alejandro Galan-Cuenca
    Paula Pérez-Sánchez
    Pau Climent-Pérez
    Andres Fuster-Guillo
    Jorge Azorin-Lopez
    Marcelo Saval-Calvo
    Juan Eduardo Guillén-Nieto
    Gabriel Soler-Capdepón
    Scientific Data, 9
  • [33] CWC-MP-MC Image-based breast tumor classification using an optimized Vision Transformer (ViT)
    Kabir, Shahriar Mahmud
    Bhuiyan, Mohammed Imamul Hassan
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 100
  • [34] Sediment Classification of Small-Size Seabed Acoustic Images Using Convolutional Neural Networks
    Luo, Xiaowen
    Qin, Xiaoming
    Wu, Ziyin
    Yang, Fanlin
    Wang, Mingwei
    Shang, Jihong
    IEEE ACCESS, 2019, 7 : 98331 - 98339
  • [35] TransMCGC: a recast vision transformer for small-scale image classification tasks
    Xiang, Jian-Wen
    Chen, Min-Rong
    Li, Pei-Shan
    Zou, Hao-Li
    Li, Shi-Da
    Huang, Jun-Jie
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (10): : 7697 - 7718
  • [36] TransMCGC: a recast vision transformer for small-scale image classification tasks
    Jian-Wen Xiang
    Min-Rong Chen
    Pei-Shan Li
    Hao-Li Zou
    Shi-Da Li
    Jun-Jie Huang
    Neural Computing and Applications, 2023, 35 : 7697 - 7718
  • [37] ViT-DexiNet: a vision transformer-based edge detection operator for small object detection in SAR images
    Sivapriya, M. S.
    Suresh, S.
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2023, 44 (22) : 7057 - 7084
  • [38] ViT-UNet: A Vision Transformer Based UNet Model for Coastal Wetland Classification Based on High Spatial Resolution Imagery
    Zhou, Nan
    Xu, Mingming
    Shen, Biaoqun
    Hou, Ke
    Liu, Shanwei
    Sheng, Hui
    Liu, Yanfen
    Wan, Jianhua
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 19575 - 19587
  • [39] MAT-VIT:A Vision Transformer with MAE-Based Self-Supervised Auxiliary Task for Medical Image Classification
    Han, Yufei
    Chen, Haoyuan
    Yao, Linwei
    Li, Kuan
    Yin, Jianping
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 2040 - 2046
  • [40] Design of experiments informed deep learning for modeling of directed energy deposition process with a small-size experimental dataset
    Chen, Chengxi
    Wong, Stanley Jian Liang
    Raghavan, Srinivasan
    Li, Hua
    MATERIALS & DESIGN, 2022, 222