ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

被引:3
|
作者
Abbas, Farhat [1 ]
Yasmin, Mussarat [1 ]
Fayyaz, Muhammad [2 ]
Asim, Usman [3 ]
机构
[1] COMSATS Univ Islamabad, Dept Comp Sci, Wah Campus, WahCantt 47040, Pakistan
[2] FAST Natl Univ Comp & Emerging Sci NUCES, Dept Comp Sci, Chiniot Faisalabad Campus, Chiniot, Punjab, Pakistan
[3] DeltaX, 3F,24,Namdaemun Ro 9 Gil, Seoul, South Korea
关键词
Vision transformer; LSA and SPT; Deep CNN models; SS datasets; Pedestrian gender classification; CONVOLUTIONAL NEURAL-NETWORK; RECOGNITION;
D O I
10.1007/s10044-023-01196-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications like content-based image retrieval, visual surveillance, smart city, and demographic collection. In the last decade, convolutional neural networks (CNN) have appeared with great potential and with reliable choices for vision tasks, such as object classification, recognition, detection, etc. But CNN has a limited local receptive field that prevents them from learning information about the global context. In contrast, a vision transformer (ViT) is a better alternative to CNN because it utilizes a self-attention mechanism to attend to a different patch of an input image. In this work, generic and effective modules such as locality self-attention (LSA), and shifted patch tokenization (SPT)-based vision transformer model are explored for the PGC task. With the use of these modules in ViT, it is successfully able to learn from stretch even on small-size (SS) datasets and overcome the lack of locality inductive bias. Through extensive experimentation, we found that the proposed ViT model produced better results in terms of overall and mean accuracies. The better results confirm that ViT outperformed state-of-the-art (SOTA) PGC methods.
引用
收藏
页码:1805 / 1819
页数:15
相关论文
共 50 条
  • [11] Pose measurement of small-size aircraft based on machine vision
    Li, Yunhui
    Fang, Ou
    Miao, Zhonghua
    Huo, Ju
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 6673 - 6678
  • [12] Plant-CNN-ViT: Plant Classification with Ensemble of Convolutional Neural Networks and Vision Transformer
    Lee, Chin Poo
    Lim, Kian Ming
    Song, Yu Xuan
    Alqahtani, Ali
    PLANTS-BASEL, 2023, 12 (14):
  • [13] CFFI-Vit: Enhanced Vision Transformer for the Accurate Classification of Fish Feeding Intensity in Aquaculture
    Liu, Jintao
    Becerra, Alfredo Tolon
    Bienvenido-Barcena, Jose Fernando
    Yang, Xinting
    Zhao, Zhenxi
    Zhou, Chao
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (07)
  • [14] WMC-ViT: Waste Multi-class Classification Using a Modified Vision Transformer
    Kurz, Aidan
    Adams, Ethan
    Depoian, Arthur C.
    Bailey, Colleen P.
    Guturu, Parthasarathy
    2022 IEEE METROCON, 2022, : 13 - 15
  • [15] CLASSIFICATION OF INTRACRANIAL HEMORRHAGE BASED ON CT-SCAN IMAGE WITH VISION TRANSFORMER (VIT) METHOD
    Faiz, Muhammad Nur
    Badriyah, Tessy
    Kusuma, Selvia Ferdiana
    2024 INTERNATIONAL ELECTRONICS SYMPOSIUM, IES 2024, 2024, : 454 - 459
  • [16] Transforming Alzheimer's Disease Diagnosis: Implementing Vision Transformer (ViT) for MRI Images Classification
    Kurniasari, Dian
    Pratama, Muhammad Dwi
    Junaidi, Akmal
    Faisol, Ahmad
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2025, 24 (01): : 130 - 152
  • [17] Cancer detection for small-size and ambiguous tumors based on semantic FPN and transformer
    He, Jingzhen
    Wang, Jing
    Han, Zeyu
    Li, Baojun
    Lv, Mei
    Shi, Yunfeng
    PLOS ONE, 2023, 18 (02):
  • [18] Small-size Pedestrian Detection in Large Scene Based on Fast R-CNN
    Wang, Shengke
    Yang, Na
    Duan, Lianghua
    Liu, Lu
    Dong, Junyu
    NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
  • [19] MASK-VIT: AN OBJECT MASK EMBEDDING IN VISION TRANSFORMER FOR FINE-GRAINED VISUAL CLASSIFICATION
    Su, Tong
    Ye, Shuo
    Song, Chengqun
    Cheng, Jun
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1626 - 1630
  • [20] LT-VIT: A VISION TRANSFORMER FOR MULTI-LABEL CHEST X-RAY CLASSIFICATION
    Marikkar, Umar
    Atito, Sara
    Awais, Muhammad
    Mahdi, Adam
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2565 - 2569