Multimodal Alignment and Attention-Based Person Search via Natural Language Description

被引:9
|
作者
Ji, Zhong [1 ]
Li, Shengjia [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Natural languages; Visualization; Internet of Things; Cameras; Sensors; Surveillance; Attention mechanism (AM); natural language description; person search; Visual Internet of Things (VIoT); NETWORK;
D O I
10.1109/JIOT.2020.2995148
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Internet of Things (VIoT) has been widely deployed in the field of social security. However, how to enable it to be intelligent is an urgent yet challenging task. In this article, we address the task of searching persons with natural language description query in a public safety surveillance system, which is a practical and demanding technique in VIoT. It is a fine-grained many-to-many cross-modal problem and more challenging than those with the image and the attribute as queries. The existing attempts are still weak in bridging the semantic gap between visual modality from different camera sensors and text modality from natural language descriptions. We propose a deep person search approach with a natural language description query by employing the attention mechanism (AM) and multimodal alignment (MA) method to supervise the cross-modal mapping. Particularly, the AM consists of two self-attention modules and one cross-attention module, where the former aims at learning discriminative representations and the latter supervises each other with their own information to offer accurate guidance to a common space. The MA approach contains three alignment processes with a novel cross-ranking loss function to make different matching pairs separable in a common space. Extensive experiments on large-scale CUHK-PEDES demonstrate the superiority of the proposed approach.
引用
收藏
页码:11147 / 11156
页数:10
相关论文
共 50 条
  • [1] Attention-based Natural Language Person Retrieval
    Zhou, Tao
    Chen, Muhao
    Yu, Jie
    Terzopoulos, Demetri
    [J]. 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, : 27 - 34
  • [2] Person Search with Natural Language Description
    Li, Shuang
    Xiao, Tong
    Li, Hongsheng
    Zhou, Bolei
    Yue, Dayu
    Wang, Xiaogang
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5187 - 5196
  • [3] Attention-Based Multimodal Fusion for Video Description
    Hori, Chiori
    Hori, Takaaki
    Lee, Teng-Yok
    Zhang, Ziming
    Harsham, Bret
    Hershey, John R.
    Marks, Tim K.
    Sumi, Kazuhiko
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4203 - 4212
  • [4] ABOS: an attention-based one-stage framework for person search
    Yuqi Chen
    Dezhi Han
    Mingming Cui
    Zhongdai Wu
    Chin-Chen Chang
    [J]. EURASIP Journal on Wireless Communications and Networking, 2022
  • [5] ABOS: an attention-based one-stage framework for person search
    Chen, Yuqi
    Han, Dezhi
    Cui, Mingming
    Wu, Zhongdai
    Chang, Chin-Chen
    [J]. EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2022, 2022 (01)
  • [6] Attention-Based Neural Architecture Search for Person Re-Identification
    Zhou, Qinqin
    Zhong, Bineng
    Liu, Xin
    Ji, Rongrong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) : 6627 - 6639
  • [7] Group Gated Fusion on Attention-based Bidirectional Alignment for Multimodal Emotion Recognition
    Liu, Pengfei
    Li, Kun
    Meng, Helen
    [J]. INTERSPEECH 2020, 2020, : 379 - 383
  • [8] AMAM: An Attention-based Multimodal Alignment Model for Medical Visual Question Answering
    Pan, Haiwei
    He, Shuning
    Zhang, Kejia
    Qu, Bo
    Chen, Chunling
    Shi, Kun
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 255
  • [9] Attention-Based Fusion of Ultrashort Voice Utterances and Depth Videos for Multimodal Person Identification
    Moufidi, Abderrazzaq
    Rousseau, David
    Rasti, Pejman
    [J]. SENSORS, 2023, 23 (13)
  • [10] Interactive Natural Language-Based Person Search
    Shree, Vikram
    Chao, Wei-Lun
    Campbell, Mark
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) : 1851 - 1858