ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

被引:3
|
作者
Abbas, Farhat [1 ]
Yasmin, Mussarat [1 ]
Fayyaz, Muhammad [2 ]
Asim, Usman [3 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Wah Campus, WahCantt 47040, Pakistan
[2] FAST Natl Univ Comp & Emerging Sci NUCES, Dept Comp Sci, Chiniot Faisalabad Campus, Chiniot, Punjab, Pakistan
[3] DeltaX, 3F,24,Namdaemun Ro 9 Gil, Seoul, South Korea
关键词
Vision transformer; LSA and SPT; Deep CNN models; SS datasets; Pedestrian gender classification; CONVOLUTIONAL NEURAL-NETWORK; RECOGNITION;
D O I
10.1007/s10044-023-01196-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications like content-based image retrieval, visual surveillance, smart city, and demographic collection. In the last decade, convolutional neural networks (CNN) have appeared with great potential and with reliable choices for vision tasks, such as object classification, recognition, detection, etc. But CNN has a limited local receptive field that prevents them from learning information about the global context. In contrast, a vision transformer (ViT) is a better alternative to CNN because it utilizes a self-attention mechanism to attend to a different patch of an input image. In this work, generic and effective modules such as locality self-attention (LSA), and shifted patch tokenization (SPT)-based vision transformer model are explored for the PGC task. With the use of these modules in ViT, it is successfully able to learn from stretch even on small-size (SS) datasets and overcome the lack of locality inductive bias. Through extensive experimentation, we found that the proposed ViT model produced better results in terms of overall and mean accuracies. The better results confirm that ViT outperformed state-of-the-art (SOTA) PGC methods.
引用
收藏
页码:1805 / 1819
页数:15
相关论文
共 50 条
  • [1] ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset
    Farhat Abbas
    Mussarat Yasmin
    Muhammad Fayyaz
    Usman Asim
    Pattern Analysis and Applications, 2023, 26 : 1805 - 1819
  • [2] Improving Vision Transformers to Learn Small-Size Dataset From Scratch
    Lee, Seunghoon
    Lee, Seunghyun
    Song, Byung Cheol
    IEEE ACCESS, 2022, 10 : 123212 - 123224
  • [3] Vision Transformer (ViT)-based Applications in Image Classification
    Huo, Yingzi
    Jin, Kai
    Cai, Jiahong
    Xiong, Huixuan
    Pang, Jiacheng
    2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS, 2023, : 135 - 140
  • [4] A ViT Vision Transformer Model for Rose Leaf Disease Classification
    Saini, Archana
    Guleria, Kalpna
    Sharma, Shagun
    2024 2ND WORLD CONFERENCE ON COMMUNICATION & COMPUTING, WCONF 2024, 2024,
  • [5] Shot-ViT: Cricket Batting Shots Classification with Vision Transformer Network
    Dey, A.
    Biswas, S.
    INTERNATIONAL JOURNAL OF ENGINEERING, 2024, 37 (12): : 2463 - 2472
  • [6] MIL-ViT: A multiple instance vision transformer for fundus image classification
    Bi, Qi
    Sun, Xu
    Yu, Shuang
    Ma, Kai
    Bian, Cheng
    Ning, Munan
    He, Nanjun
    Huang, Yawen
    Li, Yuexiang
    Liu, Hanruo
    Zheng, Yefeng
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 97
  • [7] A Channel-Cascading Pedestrian Detection Network for Small-Size Pedestrians
    He, Jiaojiao
    Liu, Ken
    Zhang, Yongping
    Yao, Tuozhong
    Zhao, Zhongjie
    Xiao, Jiangjian
    Peng, Chengbin
    Aguilar, Wilbert G.
    Sandoval, David S.
    Caballeros, Jessica
    Alvarez, Leandro G.
    Limaico, Alex
    Rodriguez, Guillermo A.
    Quisaguano, Fernando J.
    INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING, 2018, 11266 : 325 - 338
  • [8] DHVT: Dynamic Hybrid Vision Transformer for Small Dataset Recognition
    Lu, Zhiying
    Liu, Chuanbin
    Chang, Xiaojun
    Zhang, Yongdong
    Xie, Hongtao
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (04) : 2615 - 2631
  • [9] Efficient identification and classification of apple leaf diseases using lightweight vision transformer (ViT)
    Ullah, Wasi
    Javed, Kashif
    Khan, Muhammad Attique
    Alghayadh, Faisal Yousef
    Bhatt, Mohammed Wasim
    Al Naimi, Imad Saud
    Ofori, Isaac
    DISCOVER SUSTAINABILITY, 2024, 5 (01):
  • [10] Order-ViT: Order Learning Vision Transformer for Cancer Classification in Pathology Images
    Lee, Ju Cheon
    Kwak, Jin Tae
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2485 - 2494