A Cross-Modal Guiding and Fusion Method for Multi-Modal RSVP-based Image Retrieval

被引:0
|
作者
Mao, Jiayu [1 ,2 ]
Qiu, Shuang [2 ]
Li, Dan [1 ,2 ]
Wei, Wei [1 ,2 ]
He, Huiguang [1 ,2 ,3 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Res Ctr Brain Inspired Intelligence, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Deep learning; convolutional neural network; electroencephalography; eye movement; rapid serial visual presentation;
D O I
10.1109/IJCNN52387.2021.9534465
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Rapid Serial Visual Presentation (RSVP) is an important paradigm in Brain-Computer Interface (BCI). It can be used in speller, image retrieval, anomaly detection, etc. RSVP paradigm uses a small number of target pictures in a high speed presented picture sequence to induce specific event-related potential (ERP) components. However, the application of RSVP based BCI is challenged by the accuracy of ERP detection. Thus, the goal of this study is to introduce other related modalities to the traditional EEG-based BCI to make robust predictions and improve the detection performance. First, we introduce the eye movement modality into the RSVP-based BCI and collect a multi-modality RSVP-based dataset simultaneously during the image retrieval task. Second, we design a simple but efficient CNN-based network with two modality fusion modules to fully utilize the multi-modality data in two stages. In the feature extraction stage, we propose a Cross-modality-Guided Feature Calibration (cm-GFC) module to enable the EEG modality feature to modify the eye movement modality feature, and the aim is to make eye movement modality features and EEG modality features are more complementary. In the feature fusion stage, we propose a Dynamic Gated Fusion (DGF) module, which applies modality-specific gates to retain the complementary information of the two modalities and reduce redundant information from the two modalities. To evaluate our method, we conduct extensive experiments on the dataset with EEG and eye movement data are from 20 subjects. The proposed method achieves a high balanced accuracy of 87.83 +/- 2.31% of classification, which outperforms a series of single modality and multi-modality approaches.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Cross-modal guiding and reweighting network for multi-modal RSVP-based target detection
    Mao, Jiayu
    Qiu, Shuang
    Wei, Wei
    He, Huiguang
    [J]. NEURAL NETWORKS, 2023, 161 : 65 - 82
  • [2] Colour image cross-modal retrieval method based on multi-modal visual data fusion
    Liu, Xiangyuan
    [J]. International Journal of Computational Intelligence Studies, 2023, 12 (1-2) : 118 - 129
  • [3] Multi-modal and cross-modal for lecture videos retrieval
    Nhu Van Nguyen
    Coustaty, Mickal
    Ogier, Jean-Marc
    [J]. 2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 2667 - 2672
  • [4] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Yu, Jun
    Wu, Xiao-Jun
    Zhang, Donglin
    [J]. COGNITIVE COMPUTATION, 2022, 14 (03) : 1159 - 1171
  • [5] Unsupervised Multi-modal Hashing for Cross-Modal Retrieval
    Jun Yu
    Xiao-Jun Wu
    Donglin Zhang
    [J]. Cognitive Computation, 2022, 14 : 1159 - 1171
  • [6] Multi-modal semantic autoencoder for cross-modal retrieval
    Wu, Yiling
    Wang, Shuhui
    Huang, Qingming
    [J]. NEUROCOMPUTING, 2019, 331 : 165 - 175
  • [7] Cross-Modal Retrieval Augmentation for Multi-Modal Classification
    Gur, Shir
    Neverova, Natalia
    Stauffer, Chris
    Lim, Ser-Nam
    Kiela, Douwe
    Reiter, Austin
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 111 - 123
  • [8] Cross-modal attention for multi-modal image registration
    Song, Xinrui
    Chao, Hanqing
    Xu, Xuanang
    Guo, Hengtao
    Xu, Sheng
    Turkbey, Baris
    Wood, Bradford J.
    Sanford, Thomas
    Wang, Ge
    Yan, Pingkun
    [J]. MEDICAL IMAGE ANALYSIS, 2022, 82
  • [9] Disambiguity and Alignment: An Effective Multi-Modal Alignment Method for Cross-Modal Recipe Retrieval
    Zou, Zhuoyang
    Zhu, Xinghui
    Zhu, Qinying
    Zhang, Hongyan
    Zhu, Lei
    [J]. FOODS, 2024, 13 (11)
  • [10] Adversarial Graph Attention Network for Multi-modal Cross-modal Retrieval
    Wu, Hongchang
    Guan, Ziyu
    Zhi, Tao
    zhao, Wei
    Xu, Cai
    Han, Hong
    Yang, Yarning
    [J]. 2019 10TH IEEE INTERNATIONAL CONFERENCE ON BIG KNOWLEDGE (ICBK 2019), 2019, : 265 - 272