Using Text to Teach Image Retrieval

被引:1
|
作者
Dong, Haoyu [1 ]
Wang, Ze [2 ]
Qiu, Qiang [2 ]
Sapiro, Guillermo [1 ]
机构
[1] Duke Univ, Durham, NC 27706 USA
[2] Purdue Univ, W Lafayette, IN 47907 USA
关键词
D O I
10.1109/CVPRW53098.2021.00180
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image retrieval relies heavily on the quality of the data modeling and the distance measurement in the feature space. Building on the concept of image manifold, we first propose to represent the feature space of images, learned via neural networks, as a graph. Neighborhoods in the feature space are now defined by the geodesic distance between images, represented as graph vertices or manifold samples. When limited images are available, this manifold is sparsely sampled, making the geodesic computation and the corresponding retrieval harder. To address this, we augment the manifold samples with geometrically aligned text, thereby using a plethora of sentences to teach us about images. In addition to extensive results on standard datasets illustrating the power of text to help in image retrieval, a new public dataset based on CLEVR is introduced to quantify the semantic similarity between visual data and text data. The experimental results show that the joint embedding manifold is a robust representation, allowing it to be a better basis to perform image retrieval given only an image and a textual instruction on the desired modifications over the image.
引用
收藏
页码:1643 / 1652
页数:10
相关论文
共 50 条
  • [41] Progressive Positive Association Framework for Image and Text Retrieval
    Li, Wenhui
    Wang, Yan
    Su, Yuting
    Wang, Lanjun
    Nie, Weizhi
    Liu, An-An
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4807 - 4815
  • [42] A Scene Text-Based Image Retrieval System
    Thuy Ho
    Ngoc Ly
    [J]. 2012 IEEE INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND INFORMATION TECHNOLOGY (ISSPIT), 2012, : 79 - 84
  • [43] Multilateral Semantic Relations Modeling for Image Text Retrieval
    Wang, Zheng
    Gaol, Zhenwei
    Guol, Kangshuai
    Yang, Yang
    Wang, Xiaorning
    Shen, Heng Tao
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2830 - 2839
  • [44] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [45] Words Matter: Scene Text for Image Classification and Retrieval
    Karaoglu, Sezer
    Tao, Ran
    Gevers, Theo
    Smeulders, Arnold W. M.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (05) : 1063 - 1076
  • [46] Active hashing and its application to image and text retrieval
    Yi Zhen
    Dit-Yan Yeung
    [J]. Data Mining and Knowledge Discovery, 2013, 26 : 255 - 274
  • [47] Text-to-Image Fashion Retrieval with Fabric Textures
    Suzuki, Daichi
    Irie, Go
    Aizawa, Kiyoharu
    [J]. PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 525 - 529
  • [48] Asymmetric bi-encoder for image–text retrieval
    Wei Xiong
    Haoliang Liu
    Siya Mi
    Yu Zhang
    [J]. Multimedia Systems, 2023, 29 : 3805 - 3818
  • [49] Weighted Semantic Fusion of Text and Content for Image Retrieval
    Goel, Nidhi
    Sehgal, Priti
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 681 - 687
  • [50] Robust Text Image Alignment with Template For Information Retrieval
    Yan, Chengzhe
    Hu, Jie
    Cui, Runpeng
    Zhang, Changshui
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 1874 - 1879