Multimodal Alignment and Attention-Based Person Search via Natural Language Description

被引:9
|
作者
Ji, Zhong [1 ]
Li, Shengjia [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Natural languages; Visualization; Internet of Things; Cameras; Sensors; Surveillance; Attention mechanism (AM); natural language description; person search; Visual Internet of Things (VIoT); NETWORK;
D O I
10.1109/JIOT.2020.2995148
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Internet of Things (VIoT) has been widely deployed in the field of social security. However, how to enable it to be intelligent is an urgent yet challenging task. In this article, we address the task of searching persons with natural language description query in a public safety surveillance system, which is a practical and demanding technique in VIoT. It is a fine-grained many-to-many cross-modal problem and more challenging than those with the image and the attribute as queries. The existing attempts are still weak in bridging the semantic gap between visual modality from different camera sensors and text modality from natural language descriptions. We propose a deep person search approach with a natural language description query by employing the attention mechanism (AM) and multimodal alignment (MA) method to supervise the cross-modal mapping. Particularly, the AM consists of two self-attention modules and one cross-attention module, where the former aims at learning discriminative representations and the latter supervises each other with their own information to offer accurate guidance to a common space. The MA approach contains three alignment processes with a novel cross-ranking loss function to make different matching pairs separable in a common space. Extensive experiments on large-scale CUHK-PEDES demonstrate the superiority of the proposed approach.
引用
下载
收藏
页码:11147 / 11156
页数:10
相关论文
共 50 条
  • [21] Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension
    Liu, Shusen
    Li, Tao
    Li, Zhimin
    Srikumar, Vivek
    Pascucci, Valerio
    Bremer, Peer-Timo
    CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, 2018, : 36 - 41
  • [22] BERT for the Processing of Radiological Reports: An Attention-based Natural Language Processing Algorithm
    Soffer, Shelly
    Glicksberg, Benjamin S.
    Zimlichman, Eyal
    Klang, Eyal
    ACADEMIC RADIOLOGY, 2022, 29 (04) : 634 - 635
  • [23] Multimodal Emotion Detection via Attention-Based Fusion of Extracted Facial and Speech Features
    Mamieva, Dilnoza
    Abdusalomov, Akmalbek Bobomirzaevich
    Kutlimuratov, Alpamis
    Muminov, Bahodir
    Whangbo, Taeg Keun
    SENSORS, 2023, 23 (12)
  • [24] Person image generation with attention-based injection network
    Liu, Meichen
    Wang, Kejun
    Ji, Ruihang
    Ge, Shuzhi Sam
    Chen, Jing
    NEUROCOMPUTING, 2021, 460 : 345 - 359
  • [25] Person Tube Retrieval via Language Description
    Fan, Hehe
    Yang, Yi
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10754 - 10761
  • [26] PRUNING SUBSEQUENCE SEARCH WITH ATTENTION-BASED EMBEDDING
    Raffel, Colin
    Ellis, Daniel P. W.
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 554 - 558
  • [27] Hierarchical attention-based multimodal fusion for video captioning
    Wu, Chunlei
    Wei, Yiwei
    Chu, Xiaoliang
    Weichen, Sun
    Su, Fei
    Wang, Leiquan
    NEUROCOMPUTING, 2018, 315 : 362 - 370
  • [28] Person Search Based on Attention Mechanism
    Huang, Zhongjie
    Sun, Songlin
    Liu, Yuhao
    ISCIT 2019: PROCEEDINGS OF 2019 19TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2019, : 555 - 558
  • [29] Search Activities for Innovation: An Attention-Based View
    Tseng, Chuan-Chuan
    Fang, Shih-Chieh
    Chiu, Yen-Ting Helena
    INTERNATIONAL JOURNAL OF BUSINESS, 2011, 16 (01): : 51 - 70
  • [30] Coherent Dialogue with Attention-Based Language Models
    Mei, Hongyuan
    Bansal, Mohit
    Walter, Matthew R.
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3252 - 3258