Image Retrieval Using Convolutional Autoencoder, InfoGAN, and Vision Transformer Unsupervised Models

被引:6
|
作者
Sabry, Eman S. [1 ]
Elagooz, Salah S. [1 ]
Abd El-Samie, Fathi E. [2 ]
El-Shafai, Walid [2 ,3 ]
El-Bahnasawy, Nirmeen A. [4 ]
El-Banby, Ghada M. [5 ]
Algarni, Abeer D. [6 ]
Soliman, Naglaa F. [6 ]
Ramadan, Rabie A. [7 ]
机构
[1] El Shorouk Acad, Higher Inst Engn, Dept Commun & Comp Engn, El Shorouk 11837, Egypt
[2] Menoufia Univ, Fac Elect Engn, Dept Elect & Elect Commun Engn, Menoufia 32952, Egypt
[3] Prince Sultan Univ, Comp Sci Dept, Secur Engn Lab, Riyadh 11586, Saudi Arabia
[4] Menoufia Univ, Fac Elect Engn, Comp Sci & Engn Dept, Menoufia 32952, Egypt
[5] Menoufia Univ, Fac Elect Engn, Dept Ind Elect & Control Engn, Menoufia 32952, Egypt
[6] Princess Nourah Bint Abdulrahman Univ, Coll Comp & Informat Sci, Dept Informat Technol, Riyadh 11671, Saudi Arabia
[7] Cairo Univ, Coll Engn, Comp Engn Dept, Giza 12613, Egypt
关键词
Feature extraction; InfoGAN; sketched-real image retrieval; object matching; spatial distance measurement; vision transformer; 3D VIDEO COMMUNICATION; ALGORITHMS;
D O I
10.1109/ACCESS.2023.3241858
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Query by Image Content (QBIC), subsequently known as Content-Based Image Retrieval (CBIR), offers an advantageous solution in a variety of applications, including medical, meteorological, search by image, and other applications. Such CBIR systems primarily use similarity matching algorithms to compare image content to get matched images from datasets. They essentially measure the spatial distance between extracted visual features from a query image and its similar versions in the dataset. One of the most challenging query retrieval problems is Facial Sketched-Real Image Retrieval (FSRIR), which is based on content similarity matching. These facial retrieval systems are employed in a variety of contexts, including criminal justice. The difficulties of retrieving such sorts come from the composition of the human face and its distinctive parts. In addition, the comparison between these types of images is made within two different domains. Besides, to our knowledge, there is a few large-scale facial datasets that can be used to assess the performance of the retrieval systems. The success of the retrieval process is governed by the method used to estimate similarity and the efficient representation of compared images. However, by effectively representing visual features, the main challenge-posing component of such systems might be resolved. Hence, this paper has several contributions that fill the research gap in content-based similarity matching and retrieval. The first contribution is extending the Chinese University Face Sketch (CUFS) dataset by including augmented images, introducing to the community a novel dataset named Extended Sketched-Real Image Retrieval (ESRIR). The CUFS dataset has been extended from 100 images to include 53,000 facial sketches and 53,000 real facial images. The paper second contribution is presenting three new systems for sketched-real image retrieval based on convolutional autoencoder, InfoGAN, and Vision Transformer (ViT) unsupervised models for large datasets. Furthermore, to meet the subjective demands of the users due to the prevalence of multiple query formats, the third contribution of the paper is to train and assess the performance of the proposed models on two additional facial datasets of different image types. Recently, the majority of people have preferred searching for brand logo images, but it may be tricky to separate certain brand logo features their alternatives and even from other features in an image. Thus, the fourth contribution is to compare logo image retrieval performance based on visual features derived from each of the three suggested retrieval systems. The paper also presents cloud-based energy and computational complexity saving approaches on large-scale datasets. Due to the ubiquity of touchscreen devices, users often make drawings based on their fantasies for certain object image searches. Thus, the proposed models are tested and assessed on a tough dataset of doodle-scratched human artworks. They are also studied on a multi-category dataset to cover practically all possible image types and situations. The results are compared with those of the most recent algorithms found in the literature. The results show that the proposed systems outperform the recent counterparts.
引用
收藏
页码:20445 / 20477
页数:33
相关论文
共 50 条
  • [21] Learning Unsupervised Visual Representations using 3D Convolutional Autoencoder with Temporal Contrastive Modeling for Video Retrieval
    Kumar, Vidit
    Tripathi, Vikas
    Pant, Bhaskar
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2022, 7 (02) : 272 - 287
  • [22] Content-based image retrieval for the diagnosis of myocardial perfusion imaging using a deep convolutional autoencoder
    Higaki, Akinori
    Kawaguchi, Naoto
    Kurokawa, Tsukasa
    Okabe, Hikaru
    Kazatani, Takuro
    Kido, Shinsuke
    Aono, Tetsuya
    Matsuda, Kensho
    Tanaka, Yuta
    Hosokawa, Saki
    Kosaki, Tetsuya
    Kawamura, Go
    Shigematsu, Tatsuya
    Kawada, Yoshitaka
    Hiasa, Go
    Yamada, Tadakatsu
    Okayama, Hideki
    JOURNAL OF NUCLEAR CARDIOLOGY, 2023, 30 (02) : 540 - 549
  • [23] Unsupervised Transformer Balanced Hashing for Multispectral Remote Sensing Image Retrieval
    Chen, Yaxiong
    Wang, Fan
    Lu, Lin
    Xiong, Shengwu
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2023, 16 : 7089 - 7099
  • [24] Content-based image retrieval for the diagnosis of myocardial perfusion imaging using a deep convolutional autoencoder
    Akinori Higaki
    Naoto Kawaguchi
    Tsukasa Kurokawa
    Hikaru Okabe
    Takuro Kazatani
    Shinsuke Kido
    Tetsuya Aono
    Kensho Matsuda
    Yuta Tanaka
    Saki Hosokawa
    Tetsuya Kosaki
    Go Kawamura
    Tatsuya Shigematsu
    Yoshitaka Kawada
    Go Hiasa
    Tadakatsu Yamada
    Hideki Okayama
    Journal of Nuclear Cardiology, 2023, 30 : 540 - 549
  • [25] Unsupervised brain lesion segmentation from MRI using a convolutional autoencoder
    Atlason, Hans E.
    Love, Askell
    Sigurdsson, Sigurdur
    Gudnason, Vilmundur
    Ellingsen, Lotta M.
    MEDICAL IMAGING 2019: IMAGE PROCESSING, 2019, 10949
  • [26] Unsupervised feature learning for electrocardiogram data using the convolutional variational autoencoder
    Jang, Jong-Hwan
    Kim, Tae Young
    Lim, Hong-Seok
    Yoon, Dukyong
    PLOS ONE, 2021, 16 (12):
  • [27] Unsupervised Change Detection Using Convolutional-Autoencoder Multiresolution Features
    Bergamasco, Luca
    Saha, Sudipan
    Bovolo, Francesca
    Bruzzone, Lorenzo
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [28] Unsupervised Machine Anomaly Detection Using Autoencoder and Temporal Convolutional Network
    Li, Zhiyuan
    Sun, Yu
    Yang, Laihao
    Zhao, Zhibin
    Chen, Xuefeng
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [29] Identifying strong lenses with unsupervised machine learning using convolutional autoencoder
    Cheng, Ting-Yun
    Li, Nan
    Conselice, Christopher J.
    Aragon-Salamanca, Alfonso
    Dye, Simon
    Metcalf, Robert B.
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2020, 494 (03) : 3750 - 3765
  • [30] HashFormer: Vision Transformer Based Deep Hashing for Image Retrieval
    Li, Tao
    Zhang, Zheng
    Pei, Lishen
    Gan, Yan
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 827 - 831