All You Can Embed: Natural Language based Vehicle Retrieval with Spatio-Temporal Transformers

被引：6

作者：

Scribano, Carmelo ^{[1
,3
]}

Sapienza, Davide ^{[1
,3
]}

Franchini, Giorgia ^{[1
,2
]}

Verucchi, Micaela ^{[1
]}

Bertogna, Marko ^{[1
]}

机构：

[1] Univ Modena & Reggio Emilia, Modena, Italy

[2] Univ Ferrara, Ferrara, Italy

[3] Univ Parma, Parma, Italy

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021 | 2021年

关键词：

D O I：

10.1109/CVPRW53098.2021.00481

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Combining Natural Language with Vision represents a unique and interesting challenge in the domain of Artificial Intelligence. The AI City Challenge Track 5 for Natural Language-Based Vehicle Retrieval focuses on the problem of combining visual and textual information, applied to a smart-city use case. In this paper, we present All You Can Embed (AYCE), a modular solution to correlate single-vehicle tracking sequences with natural language. The main building blocks of the proposed architecture are (i) BERT to provide an embedding of the textual descriptions, (ii) a convolutional backbone along with a Transformer model to embed the visual information. For the training of the retrieval model, a variation of the Triplet Margin Loss is proposed to learn a distance measure between the visual and language embeddings. The code is publicly available at https://github.com/cscribano/AYCE_2021.

引用

页码：4248 / 4257

页数：10

共 50 条

[41] DAKRS: Domain Adaptive Knowledge-Based Retrieval System for Natural Language-Based Vehicle Retrieval
Ha, Synh Viet-Uyen
Le, Huy Dinh-Anh
Nguyen, Quang Qui-Vinh
Chung, Nhat Minh
IEEE ACCESS, 2023, 11 : 90951 - 90965
[42] OMG: Observe Multiple Granularities for Natural Language-Based Vehicle Retrieval
Du, Yunhao
Zhang, Binyu
Ruan, Xiangning
Su, Fei
Zhao, Zhicheng
Chen, Hong
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 3123 - 3132
[43] Towards Accurate Visual and Natural Language-Based Vehicle Retrieval Systems
Khorramshahi, Pirazh
Rambhatla, Sai Saketh
Chellappa, Rama
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 4178 - 4187
[44] Multi-level Matching of Natural Language-Based Vehicle Retrieval
Liu, Ying
Zhang, Zhongshuai
Yang, Xiaochun
WEB AND BIG DATA, PT III, APWEB-WAIM 2023, 2024, 14333 : 358 - 372
[45] A Composition-Based Image Retrieval Method for Environment-Visualization with Images and Spatio-Temporal Information
Toyoshima, Yuka
Hayashi, Yasuhiro
Kiyoki, Yasushi
2018 INTERNATIONAL ELECTRONICS SYMPOSIUM ON KNOWLEDGE CREATION AND INTELLIGENT COMPUTING (IES-KCIC), 2018, : 90 - 97
[46] The Retrieval of Forest and Grass Fractional Vegetation Coverage in Mountain Regions Based on Spatio-Temporal Transfer Learning
Huang, Yuxuan
Zhou, Xiang
Lv, Tingting
Tao, Zui
Zhang, Hongming
Li, Ruoxi
Zhai, Mingjian
Liang, Houyu
REMOTE SENSING, 2023, 15 (19)
[47] Spatio-temporal modeling of moving objects for content- and semantic-based retrieval in video data
Shim, CB
Shin, YW
KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, PT 4, PROCEEDINGS, 2005, 3684 : 343 - 351
[48] Multivariate time series modeling of geometric features of spatio-temporal volumes for content based video retrieval
Chattopadhyay C.
Maurya A.K.
International Journal of Multimedia Information Retrieval, 2014, 3 (1) : 15 - 28
[49] Conditional deep clustering based transformed spatio-temporal features and fused distance for efficient video retrieval
Banerjee A.
Kumar E.
Ravinder M.
International Journal of Information Technology, 2023, 15 (5) : 2349 - 2355
[50] Contrastive Language-Video Learning Model Based on Spatio-Temporal Information Auxiliary Supervision
Zhang, Bing-Bing
Zhang, Jian-Xin
Li, Pei-Hua
Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (08): : 1769 - 1785

← 1 2 3 4 5 →