ViT-PGC: vision transformer for pedestrian gender classification on small-size dataset

被引：3

作者：

Abbas, Farhat ^{[1
]}

Yasmin, Mussarat ^{[1
]}

Fayyaz, Muhammad ^{[2
]}

Asim, Usman ^{[3
]}

机构：

[1] COMSATS Univ Islamabad, Dept Comp Sci, Wah Campus, WahCantt 47040, Pakistan

[2] FAST Natl Univ Comp & Emerging Sci NUCES, Dept Comp Sci, Chiniot Faisalabad Campus, Chiniot, Punjab, Pakistan

[3] DeltaX, 3F,24,Namdaemun Ro 9 Gil, Seoul, South Korea

来源：

PATTERN ANALYSIS AND APPLICATIONS | 2023年 / 26卷 / 04期

关键词：

Vision transformer; LSA and SPT; Deep CNN models; SS datasets; Pedestrian gender classification; CONVOLUTIONAL NEURAL-NETWORK; RECOGNITION;

D O I：

10.1007/s10044-023-01196-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pedestrian gender classification (PGC) is a key task in full-body-based pedestrian image analysis and has become an important area in applications like content-based image retrieval, visual surveillance, smart city, and demographic collection. In the last decade, convolutional neural networks (CNN) have appeared with great potential and with reliable choices for vision tasks, such as object classification, recognition, detection, etc. But CNN has a limited local receptive field that prevents them from learning information about the global context. In contrast, a vision transformer (ViT) is a better alternative to CNN because it utilizes a self-attention mechanism to attend to a different patch of an input image. In this work, generic and effective modules such as locality self-attention (LSA), and shifted patch tokenization (SPT)-based vision transformer model are explored for the PGC task. With the use of these modules in ViT, it is successfully able to learn from stretch even on small-size (SS) datasets and overcome the lack of locality inductive bias. Through extensive experimentation, we found that the proposed ViT model produced better results in terms of overall and mean accuracies. The better results confirm that ViT outperformed state-of-the-art (SOTA) PGC methods.

引用

页码：1805 / 1819

页数：15

共 50 条

[11] Pose measurement of small-size aircraft based on machine vision
Li, Yunhui
Fang, Ou
Miao, Zhonghua
Huo, Ju
2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 6673 - 6678
[12] Plant-CNN-ViT: Plant Classification with Ensemble of Convolutional Neural Networks and Vision Transformer
Lee, Chin Poo
Lim, Kian Ming
Song, Yu Xuan
Alqahtani, Ali
PLANTS-BASEL, 2023, 12 (14):
[13] CFFI-Vit: Enhanced Vision Transformer for the Accurate Classification of Fish Feeding Intensity in Aquaculture
Liu, Jintao
Becerra, Alfredo Tolon
Bienvenido-Barcena, Jose Fernando
Yang, Xinting
Zhao, Zhenxi
Zhou, Chao
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (07)
[14] WMC-ViT: Waste Multi-class Classification Using a Modified Vision Transformer
Kurz, Aidan
Adams, Ethan
Depoian, Arthur C.
Bailey, Colleen P.
Guturu, Parthasarathy
2022 IEEE METROCON, 2022, : 13 - 15
[15] CLASSIFICATION OF INTRACRANIAL HEMORRHAGE BASED ON CT-SCAN IMAGE WITH VISION TRANSFORMER (VIT) METHOD
Faiz, Muhammad Nur
Badriyah, Tessy
Kusuma, Selvia Ferdiana
2024 INTERNATIONAL ELECTRONICS SYMPOSIUM, IES 2024, 2024, : 454 - 459
[16] Transforming Alzheimer's Disease Diagnosis: Implementing Vision Transformer (ViT) for MRI Images Classification
Kurniasari, Dian
Pratama, Muhammad Dwi
Junaidi, Akmal
Faisol, Ahmad
JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2025, 24 (01): : 130 - 152
[17] Cancer detection for small-size and ambiguous tumors based on semantic FPN and transformer
He, Jingzhen
Wang, Jing
Han, Zeyu
Li, Baojun
Lv, Mei
Shi, Yunfeng
PLOS ONE, 2023, 18 (02):
[18] Small-size Pedestrian Detection in Large Scene Based on Fast R-CNN
Wang, Shengke
Yang, Na
Duan, Lianghua
Liu, Lu
Dong, Junyu
NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017), 2018, 10615
[19] MASK-VIT: AN OBJECT MASK EMBEDDING IN VISION TRANSFORMER FOR FINE-GRAINED VISUAL CLASSIFICATION
Su, Tong
Ye, Shuo
Song, Chengqun
Cheng, Jun
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1626 - 1630
[20] LT-VIT: A VISION TRANSFORMER FOR MULTI-LABEL CHEST X-RAY CLASSIFICATION
Marikkar, Umar
Atito, Sara
Awais, Muhammad
Mahdi, Adam
2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2565 - 2569

← 1 2 3 4 5 →