Multimodal Alignment and Attention-Based Person Search via Natural Language Description

被引:9
|
作者
Ji, Zhong [1 ]
Li, Shengjia [1 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Natural languages; Visualization; Internet of Things; Cameras; Sensors; Surveillance; Attention mechanism (AM); natural language description; person search; Visual Internet of Things (VIoT); NETWORK;
D O I
10.1109/JIOT.2020.2995148
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Internet of Things (VIoT) has been widely deployed in the field of social security. However, how to enable it to be intelligent is an urgent yet challenging task. In this article, we address the task of searching persons with natural language description query in a public safety surveillance system, which is a practical and demanding technique in VIoT. It is a fine-grained many-to-many cross-modal problem and more challenging than those with the image and the attribute as queries. The existing attempts are still weak in bridging the semantic gap between visual modality from different camera sensors and text modality from natural language descriptions. We propose a deep person search approach with a natural language description query by employing the attention mechanism (AM) and multimodal alignment (MA) method to supervise the cross-modal mapping. Particularly, the AM consists of two self-attention modules and one cross-attention module, where the former aims at learning discriminative representations and the latter supervises each other with their own information to offer accurate guidance to a common space. The MA approach contains three alignment processes with a novel cross-ranking loss function to make different matching pairs separable in a common space. Extensive experiments on large-scale CUHK-PEDES demonstrate the superiority of the proposed approach.
引用
收藏
页码:11147 / 11156
页数:10
相关论文
共 50 条
  • [41] Improving Natural Language Person Description Search from Videos with Language Model Fine-Tuning and Approximate Nearest Neighbor
    Yuenyong, Sumeth
    Wongpatikaseree, Konlakorn
    [J]. BIG DATA AND COGNITIVE COMPUTING, 2022, 6 (04)
  • [42] Person Entity Alignment Method Based on Multimodal Information Aggregation
    Wang, Huansha
    Huang, Ruiyang
    Zhang, Jianpeng
    [J]. ELECTRONICS, 2022, 11 (19)
  • [43] Mutual Learning Person Search Based on Region Alignment
    Zhan, Li
    Wang, Zhiwen
    Lin, Yuehang
    Li, Ruirui
    Li, Ye
    [J]. PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS, VOL II, CENET 2023, 2024, 1126 : 355 - 365
  • [44] An Attention-Based Multimodal Siamese Architecture for Tweet-User Verification
    Suman, Chanchal
    Saha, Sriparna
    Bhattacharyya, Pushpak
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (05) : 2764 - 2772
  • [45] MAC: multimodal, attention-based cybersickness prediction modeling in virtual reality
    Jeong, Dayoung
    Paik, Seungwon
    Noh, YoungTae
    Han, Kyungsik
    [J]. VIRTUAL REALITY, 2023, 27 (03) : 2315 - 2330
  • [46] Multimodal attention-based deep learning for Alzheimer's disease diagnosis
    Golovanevsky, Michal
    Eickhoff, Carsten
    Singh, Ritambhara
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2022, 29 (12) : 2014 - 2022
  • [47] Multimodal Brain Image Segmentation and Analysis with Neuromorphic Attention-Based Learning
    Han, Woo-Sup
    Han, Il Song
    [J]. BRAINLESION: GLIOMA, MULTIPLE SCLEROSIS, STROKE AND TRAUMATIC BRAIN INJURIES (BRAINLES 2019), PT II, 2020, 11993 : 14 - 26
  • [48] Attention-Based Multimodal Entity Linking with High-Quality Images
    Zhang, Li
    Li, Zhixu
    Yang, Qiang
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2021), PT II, 2021, 12682 : 533 - 548
  • [49] Attention-Based Multimodal Neural Network for Automatic Evaluation of Press Conferences
    Yi, Shengzhou
    Mochitomi, Koshiro
    Suzuki, Isao
    Wang, Xueting
    Yamasaki, Toshihiko
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA DATA ENGINEERING & MANAGEMENT, 2020, 11 (03): : 1 - 19
  • [50] MAC: multimodal, attention-based cybersickness prediction modeling in virtual reality
    Dayoung Jeong
    Seungwon Paik
    YoungTae Noh
    Kyungsik Han
    [J]. Virtual Reality, 2023, 27 : 2315 - 2330