All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers

被引:6
|
作者
Scribano, Carmelo [1 ,3 ]
Sapienza, Davide [1 ,3 ]
Franchini, Giorgia [1 ,2 ]
Verucchi, Micaela [1 ]
Bertogna, Marko [1 ]
机构
[1] Univ Modena & Reggio Emilia, Modena, Italy
[2] Univ Ferrara, Ferrara, Italy
[3] Univ Parma, Parma, Italy
关键词
D O I
10.1109/CVPRW53098.2021.00481
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021.
引用
收藏
页码:4248 / 4257
页数:10
相关论文
共 50 条
  • [41] DAKRS: Domain Adaptive Knowledge-Based Retrieval System for Natural Language-Based Vehicle Retrieval
    Ha, Synh Viet-Uyen
    Le, Huy Dinh-Anh
    Nguyen, Quang Qui-Vinh
    Chung, Nhat Minh
    IEEE ACCESS, 2023, 11 : 90951 - 90965
  • [42] OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval
    Du, Yunhao
    Zhang, Binyu
    Ruan, Xiangning
    Su, Fei
    Zhao, Zhicheng
    Chen, Hong
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3123 - 3132
  • [43] Towards Accurate Visual and Natural Language-Based Vehicle Retrieval Systems
    Khorramshahi, Pirazh
    Rambhatla, Sai Saketh
    Chellappa, Rama
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 4178 - 4187
  • [44] Multi-level Matching of Natural Language-Based Vehicle Retrieval
    Liu, Ying
    Zhang, Zhongshuai
    Yang, Xiaochun
    WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 358 - 372
  • [45] A Composition-Based Image Retrieval Method for Environment-Visualization with Images and Spatio-Temporal Information
    Toyoshima, Yuka
    Hayashi, Yasuhiro
    Kiyoki, Yasushi
    2018 INTERNATIONAL ELECTRONICS SYMPOSIUM ON KNOWLEDGE CREATION AND INTELLIGENT COMPUTING (IES-KCIC), 2018, : 90 - 97
  • [46] The Retrieval of Forest and Grass Fractional Vegetation Coverage in Mountain Regions Based on Spatio-Temporal Transfer Learning
    Huang, Yuxuan
    Zhou, Xiang
    Lv, Tingting
    Tao, Zui
    Zhang, Hongming
    Li, Ruoxi
    Zhai, Mingjian
    Liang, Houyu
    REMOTE SENSING, 2023, 15 (19)
  • [47] Spatio-temporal modeling of moving objects for content- and semantic-based retrieval in video data
    Shim, CB
    Shin, YW
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 4, PROCEEDINGS, 2005, 3684 : 343 - 351
  • [48] Multivariate time series modeling of geometric features of spatio-temporal volumes for content based video retrieval
    Chattopadhyay C.
    Maurya A.K.
    International Journal of Multimedia Information Retrieval, 2014, 3 (1) : 15 - 28
  • [49] Conditional deep clustering based transformed spatio-temporal features and fused distance for efficient video retrieval
    Banerjee A.
    Kumar E.
    Ravinder M.
    International Journal of Information Technology, 2023, 15 (5) : 2349 - 2355
  • [50] Contrastive Language-Video Learning Model Based on Spatio-Temporal Information Auxiliary Supervision
    Zhang, Bing-Bing
    Zhang, Jian-Xin
    Li, Pei-Hua
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (08): : 1769 - 1785