Enhancing Fashion Classification with Vision Transformer (ViT) and Developing Recommendation Fashion Systems Using DINOVA2

被引:1
|
作者
Abd Alaziz, Hadeer M. [1 ]
Elmannai, Hela [2 ]
Saleh, Hager [3 ]
Hadjouni, Myriam [4 ]
Anter, Ahmed M. [5 ,6 ]
Koura, Abdelrahim [6 ]
Kayed, Mohammed [6 ]
机构
[1] Beni Suef Univ, Fac Sci, Bani Suwayf 62521, Egypt
[2] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Technol, POB 84428, Riyadh 11671, Saudi Arabia
[3] South Valley Univ, Fac Comp & Artificial Intelligence, Hurghada 84511, Egypt
[4] Princess Nourah bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Comp Sci, POB 84428, Riyadh 11671, Saudi Arabia
[5] Egypt Japan Univ Sci & Technol E JUST, Alexandria 21934, Egypt
[6] Beni Suef Univ, Fac Comp & Artificial Intelligence, Bani Suwayf 62521, Egypt
关键词
classification of fashion images; Vision Transformer (ViT); deep learning; ensemble learning; stacking; convolutional neural networks; recommendation system;
D O I
10.3390/electronics12204263
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As e-commerce platforms grow, consumers increasingly purchase clothes online; however, they often need clarification on clothing choices. Consumers and stores interact through the clothing recommendation system. A recommendation system can help customers to find clothing that they are interested in and can improve turnover. This work has two main goals: enhancing fashion classification and developing a fashion recommendation system. The main objective of fashion classification is to apply a Vision Transformer (ViT) to enhance performance. ViT is a set of transformer blocks; each transformer block consists of two layers: a multi-head self-attention layer and a multilayer perceptron (MLP) layer. The hyperparameters of ViT are configured based on the fashion images dataset. CNN models have different layers, including multi-convolutional layers, multi-max pooling layers, multi-dropout layers, multi-fully connected layers, and batch normalization layers. Furthermore, ViT is compared with different models, i.e., deep CNN models, VGG16, DenseNet-121, Mobilenet, and ResNet50, using different evaluation methods and two fashion image datasets. The ViT model performs the best on the Fashion-MNIST dataset (accuracy = 95.25, precision = 95.20, recall = 95.25, F1-score = 95.20). ViT records the highest performance compared to other models in the fashion product dataset (accuracy = 98.53, precision = 98.42, recall = 98.53, F1-score = 98.46). A recommendation fashion system is developed using Learning Robust Visual Features without Supervision (DINOv2) and a nearest neighbor search that is built in the FAISS library to obtain the top five similarity results for specific images.
引用
下载
收藏
页数:19
相关论文
共 12 条
  • [1] Enhance fashion classification of mosquito vector species via self-supervised vision transformer
    Veerayuth Kittichai
    Morakot Kaewthamasorn
    Tanawat Chaiphongpachara
    Sedthapong Laojun
    Tawee Saiwichai
    Kaung Myat Naing
    Teerawat Tongloy
    Siridech Boonsang
    Santhad Chuwongin
    Scientific Reports, 14 (1)
  • [2] Efficient identification and classification of apple leaf diseases using lightweight vision transformer (ViT)
    Ullah, Wasi
    Javed, Kashif
    Khan, Muhammad Attique
    Alghayadh, Faisal Yousef
    Bhatt, Mohammed Wasim
    Al Naimi, Imad Saud
    Ofori, Isaac
    DISCOVER SUSTAINABILITY, 2024, 5 (01):
  • [3] Generative AI-based style recommendation using fashion item detection and classification
    Kalinin, Aleksandr
    Jafari, Akbar Anbar
    Avots, Egils
    Ozcinar, Cagri
    Anbarjafari, Gholamreza
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, : 9179 - 9189
  • [4] WMC-ViT: Waste Multi-class Classification Using a Modified Vision Transformer
    Kurz, Aidan
    Adams, Ethan
    Depoian, Arthur C.
    Bailey, Colleen P.
    Guturu, Parthasarathy
    2022 IEEE METROCON, 2022, : 13 - 15
  • [5] Enhancing Cervical Pre-Cancerous Classification Using Advanced Vision Transformer
    Darwish, Manal
    Altabel, Mohamad Ziad
    Abiyev, Rahib H.
    DIAGNOSTICS, 2023, 13 (18)
  • [6] GNViT- An enhanced image-based groundnut pest classification using Vision Transformer (ViT) model
    Venkatasaichandrakanth, P.
    Iyapparaja, M.
    PLOS ONE, 2024, 19 (03):
  • [7] Hybrid Deep Learning EfficientNetV2 and Vision Transformer (EffNetV2-ViT) Model for Breast Cancer Histopathological Image Classification
    Hayat, Mansoor
    Ahmad, Nouman
    Nasir, Anam
    Ahmad Tariq, Zeeshan
    IEEE Access, 2024, 12 : 184119 - 184131
  • [8] Thoracic computed tomography (CT) image-based identification and severity classification of COVID-19 cases using vision transformer (ViT)
    Taye, Gizatie Desalegn
    Sisay, Zewdie Habtie
    Gebeyhu, Genet Worku
    Kidus, Fisha Haileslassie
    DISCOVER APPLIED SCIENCES, 2024, 6 (08)
  • [9] Enhancing Dynagraph Card Classification in Pumping Systems Using Transfer Learning and the Swin Transformer Model
    Dong, Guoqing
    Li, Weirong
    Dong, Zhenzhen
    Wang, Cai
    Qian, Shihao
    Zhang, Tianyang
    Ma, Xueling
    Zou, Lu
    Lin, Keze
    Liu, Zhaoxia
    APPLIED SCIENCES-BASEL, 2024, 14 (04):
  • [10] WBC YOLO-ViT: 2 Way-2 stage white blood cell detection and classification with a combination of YOLOv5 and vision transformer
    Tarimo, Servas Adolph
    Jang, Mi-Ae
    Ngasa, Emmanuel Edward
    Shin, Hee Bong
    Shin, Hyojin
    Woo, Jiyoung
    COMPUTERS IN BIOLOGY AND MEDICINE, 2024, 169