Multimodal Alignment and Attention-Based Person Search via Natural Language Description

被引：9

作者：

Ji, Zhong ^{[1
]}

Li, Shengjia ^{[1
]}

机构：

[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China

来源：

IEEE INTERNET OF THINGS JOURNAL | 2020年 / 7卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Task analysis; Natural languages; Visualization; Internet of Things; Cameras; Sensors; Surveillance; Attention mechanism (AM); natural language description; person search; Visual Internet of Things (VIoT); NETWORK;

D O I：

10.1109/JIOT.2020.2995148

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Visual Internet of Things (VIoT) has been widely deployed in the field of social security. However, how to enable it to be intelligent is an urgent yet challenging task. In this article, we address the task of searching persons with natural language description query in a public safety surveillance system, which is a practical and demanding technique in VIoT. It is a fine-grained many-to-many cross-modal problem and more challenging than those with the image and the attribute as queries. The existing attempts are still weak in bridging the semantic gap between visual modality from different camera sensors and text modality from natural language descriptions. We propose a deep person search approach with a natural language description query by employing the attention mechanism (AM) and multimodal alignment (MA) method to supervise the cross-modal mapping. Particularly, the AM consists of two self-attention modules and one cross-attention module, where the former aims at learning discriminative representations and the latter supervises each other with their own information to offer accurate guidance to a common space. The MA approach contains three alignment processes with a novel cross-ranking loss function to make different matching pairs separable in a common space. Extensive experiments on large-scale CUHK-PEDES demonstrate the superiority of the proposed approach.

引用

页码：11147 / 11156

页数：10

共 50 条

[1] Attention-based Natural Language Person Retrieval
Zhou, Tao
Chen, Muhao
Yu, Jie
Terzopoulos, Demetri
[J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 27 - 34
[2] Person Search with Natural Language Description
Li, Shuang
Xiao, Tong
Li, Hongsheng
Zhou, Bolei
Yue, Dayu
Wang, Xiaogang
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5187 - 5196
[3] Attention-Based Multimodal Fusion for Video Description
Hori, Chiori
Hori, Takaaki
Lee, Teng-Yok
Zhang, Ziming
Harsham, Bret
Hershey, John R.
Marks, Tim K.
Sumi, Kazuhiko
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4203 - 4212
[4] ABOS: an attention-based one-stage framework for person search
Yuqi Chen
Dezhi Han
Mingming Cui
Zhongdai Wu
Chin-Chen Chang
[J]. EURASIP Journal on Wireless Communications and Networking, 2022
[5] ABOS: an attention-based one-stage framework for person search
Chen, Yuqi
Han, Dezhi
Cui, Mingming
Wu, Zhongdai
Chang, Chin-Chen
[J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)
[6] Attention-Based Neural Architecture Search for Person Re-Identification
Zhou, Qinqin
Zhong, Bineng
Liu, Xin
Ji, Rongrong
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6627 - 6639
[7] Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
Liu, Pengfei
Li, Kun
Meng, Helen
[J]. INTERSPEECH 2020, 2020, : 379 - 383
[8] AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering
Pan, Haiwei
He, Shuning
Zhang, Kejia
Qu, Bo
Chen, Chunling
Shi, Kun
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 255
[9] Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification
Moufidi, Abderrazzaq
Rousseau, David
Rasti, Pejman
[J]. SENSORS, 2023, 23 (13)
[10] Interactive Natural Language-Based Person Search
Shree, Vikram
Chao, Wei-Lun
Campbell, Mark
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) : 1851 - 1858

← 1 2 3 4 5 →