Research on emotional semantic retrieval of attention mechanism oriented to audio-visual synesthesia

被引:3
|
作者
Wang, Weixing [1 ,2 ]
Li, Qianqian [1 ]
Xie, Jingwen [2 ]
Hu, Ningfeng [1 ]
Wang, Ziao [1 ]
Zhang, Ning [2 ]
机构
[1] Guizhou Univ, Sch Mech Engn, Guiyang, Peoples R China
[2] Minist Educ, Key Lab Adv Mfg Technol, Guiyang, Peoples R China
关键词
Synesthesia; Audio-visual; Emotional semantics; Attention mechanism; Retrieval;
D O I
10.1016/j.neucom.2022.11.036
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Digital video is widely used to record people's daily lives and share people's moods, but few researchers have conducted research on the consistency of emotional expression between short videos and music. In order to be able to match the appropriate background music to the short video image autonomously and efficiently, the paper analyzed the emotional connection between the two from the audio-visual synes-thesia. First, emotional semantics was used as a bridge to connect video data and music data, and a video -music synesthesia data set based on semantic words was constructed. Then, an attention mechanism was incorporated to better extract key features in video images. In the extraction of music features, an improved lenet5 network was used, and the optimal network parameters were determined through experiments. Finally, the two types of features were fused and the mutual retrieval between video and music was performed. In order to compare the performance of different models, different CNN models were calculated in the processing of video images, including VGG16, VGG19, AlexNet and GoogleNet, and the attention mechanism was added to each network for calculation to compare its retrieval accu-racy. In the processing of music data, different CNN algorithms were also used for comparative experi-ments, and networks with different layers were used to determine the optimal results. The experimental results show that the audiovisual synesthesia retrieval model based on emotion can effec-tively measure the emotional similarity between video images and music, and the method of the paper can produce a good match between them. The research method of the paper is the exploration of com-puter synesthetic intelligence, which can stimulate the creative inspiration of image and music creative designers. While enhancing the emotional experience of digital products, it also improves the efficiency and quality of development. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:194 / 204
页数:11
相关论文
共 50 条
  • [1] The research on Digital audio-visual synesthesia
    Zhang, Yu
    Liu, Xiang
    [J]. 10TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE & EDUCATION (ICCSE 2015), 2015, : 186 - 189
  • [2] DEEP AUDIO-VISUAL SPEECH SEPARATION WITH ATTENTION MECHANISM
    Li, Chenda
    Qian, Yanmin
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7314 - 7318
  • [3] Visualized voices: A case study of audio-visual synesthesia
    Fernay, Louise
    Reby, David
    Ward, Jamie
    [J]. NEUROCASE, 2012, 18 (01) : 50 - 56
  • [4] Semantic Audio-Visual Navigation
    Chen, Changan
    Al-Halah, Ziad
    Grauman, Kristen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15511 - 15520
  • [5] Audio-visual event detection based on mining of semantic audio-visual labels
    Goh, KS
    Miyahara, K
    Radhakrishan, R
    Xiong, ZY
    Divakaran, A
    [J]. STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
  • [6] A Turkish Audio-Visual Emotional Database
    Onder, Onur
    Zhalehpour, Sara
    Erdem, Cigdem Eroglu
    [J]. 2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
  • [7] Audio-Visual Event Localization by Learning Spatial and Semantic Co-Attention
    Xue, Cheng
    Zhong, Xionghu
    Cai, Minjie
    Chen, Hao
    Wang, Wenwu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 418 - 429
  • [8] Audio-visual speech processing and attention
    Sams, M
    [J]. PSYCHOPHYSIOLOGY, 2003, 40 : S5 - S6
  • [9] Emotional Sounds Guide Visual Attention to Emotional Pictures: An Eye-Tracking Study With Audio-Visual Stimuli
    Gerdes, Antje B. M.
    Alpers, Georg W.
    Braun, Hanna
    Koehler, Sabrina
    Nowak, Ulrike
    Treiber, Lisa
    [J]. EMOTION, 2021, 21 (04) : 679 - 692
  • [10] Audio-Visual Salieny Network with Audio Attention Module
    Cheng, Shuaiyang
    Gao, Xing
    Song, Liang
    Xiahou, Jianbing
    [J]. PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,