Enhancing Fashion Classification with Vision Transformer (ViT) and Developing Recommendation Fashion Systems Using DINOVA2

被引：1

作者：

Abd Alaziz, Hadeer M. ^{[1
]}

Elmannai, Hela ^{[2
]}

Saleh, Hager ^{[3
]}

Hadjouni, Myriam ^{[4
]}

Anter, Ahmed M. ^{[5
,6
]}

Koura, Abdelrahim ^{[6
]}

Kayed, Mohammed ^{[6
]}

机构：

[1] Beni Suef Univ, Fac Sci, Bani Suwayf 62521, Egypt

[2] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Technol, POB 84428, Riyadh 11671, Saudi Arabia

[3] South Valley Univ, Fac Comp & Artificial Intelligence, Hurghada 84511, Egypt

[4] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Comp Sci, POB 84428, Riyadh 11671, Saudi Arabia

[5] Egypt Japan Univ Sci & Technol E JUST, Alexandria 21934, Egypt

[6] Beni Suef Univ, Fac Comp & Artificial Intelligence, Bani Suwayf 62521, Egypt

来源：

ELECTRONICS | 2023年 / 12卷 / 20期

关键词：

classification of fashion images; Vision Transformer (ViT); deep learning; ensemble learning; stacking; convolutional neural networks; recommendation system;

D O I：

10.3390/electronics12204263

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As e-commerce platforms grow, consumers increasingly purchase clothes online; however, they often need clarification on clothing choices. Consumers and stores interact through the clothing recommendation system. A recommendation system can help customers to find clothing that they are interested in and can improve turnover. This work has two main goals: enhancing fashion classification and developing a fashion recommendation system. The main objective of fashion classification is to apply a Vision Transformer (ViT) to enhance performance. ViT is a set of transformer blocks; each transformer block consists of two layers: a multi-head self-attention layer and a multilayer perceptron (MLP) layer. The hyperparameters of ViT are configured based on the fashion images dataset. CNN models have different layers, including multi-convolutional layers, multi-max pooling layers, multi-dropout layers, multi-fully connected layers, and batch normalization layers. Furthermore, ViT is compared with different models, i.e., deep CNN models, VGG16, DenseNet-121, Mobilenet, and ResNet50, using different evaluation methods and two fashion image datasets. The ViT model performs the best on the Fashion-MNIST dataset (accuracy = 95.25, precision = 95.20, recall = 95.25, F1-score = 95.20). ViT records the highest performance compared to other models in the fashion product dataset (accuracy = 98.53, precision = 98.42, recall = 98.53, F1-score = 98.46). A recommendation fashion system is developed using Learning Robust Visual Features without Supervision (DINOv2) and a nearest neighbor search that is built in the FAISS library to obtain the top five similarity results for specific images.

引用

下载

页数：19

共 12 条

[1] Enhance fashion classification of mosquito vector species via self-supervised vision transformer
Veerayuth Kittichai
Morakot Kaewthamasorn
Tanawat Chaiphongpachara
Sedthapong Laojun
Tawee Saiwichai
Kaung Myat Naing
Teerawat Tongloy
Siridech Boonsang
Santhad Chuwongin
Scientific Reports, 14 (1)
[2] Efficient identification and classification of apple leaf diseases using lightweight vision transformer (ViT)
Ullah, Wasi
Javed, Kashif
Khan, Muhammad Attique
Alghayadh, Faisal Yousef
Bhatt, Mohammed Wasim
Al Naimi, Imad Saud
Ofori, Isaac
DISCOVER SUSTAINABILITY, 2024, 5 (01):
[3] Generative AI-based style recommendation using fashion item detection and classification
Kalinin, Aleksandr
Jafari, Akbar Anbar
Avots, Egils
Ozcinar, Cagri
Anbarjafari, Gholamreza
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, : 9179 - 9189
[4] WMC-ViT: Waste Multi-class Classification Using a Modified Vision Transformer
Kurz, Aidan
Adams, Ethan
Depoian, Arthur C.
Bailey, Colleen P.
Guturu, Parthasarathy
2022 IEEE METROCON, 2022, : 13 - 15
[5] Enhancing Cervical Pre-Cancerous Classification Using Advanced Vision Transformer
Darwish, Manal
Altabel, Mohamad Ziad
Abiyev, Rahib H.
DIAGNOSTICS, 2023, 13 (18)
[6] GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model
Venkatasaichandrakanth, P.
Iyapparaja, M.
PLOS ONE, 2024, 19 (03):
[7] Hybrid Deep Learning EfficientNetV2 and Vision Transformer (EffNetV2-ViT) Model for Breast Cancer Histopathological Image Classification
Hayat, Mansoor
Ahmad, Nouman
Nasir, Anam
Ahmad Tariq, Zeeshan
IEEE Access, 2024, 12 : 184119 - 184131
[8] Thoracic computed tomography (CT) image-based identification and severity classification of COVID-19 cases using vision transformer (ViT)
Taye, Gizatie Desalegn
Sisay, Zewdie Habtie
Gebeyhu, Genet Worku
Kidus, Fisha Haileslassie
DISCOVER APPLIED SCIENCES, 2024, 6 (08)
[9] Enhancing Dynagraph Card Classification in Pumping Systems Using Transfer Learning and the Swin Transformer Model
Dong, Guoqing
Li, Weirong
Dong, Zhenzhen
Wang, Cai
Qian, Shihao
Zhang, Tianyang
Ma, Xueling
Zou, Lu
Lin, Keze
Liu, Zhaoxia
APPLIED SCIENCES-BASEL, 2024, 14 (04):
[10] WBC YOLO-ViT: 2 Way-2 stage white blood cell detection and classification with a combination of YOLOv5 and vision transformer
Tarimo, Servas Adolph
Jang, Mi-Ae
Ngasa, Emmanuel Edward
Shin, Hee Bong
Shin, Hyojin
Woo, Jiyoung
COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169

← 1 2 →