Audio-visual saliency prediction for movie viewing in immersive environments: Dataset and benchmarks

被引:1
|
作者
Chen, Zhao [1 ]
Zhang, Kao [1 ,2 ]
Cai, Hao [1 ]
Ding, Xiaoying [3 ]
Jiang, Chenxi [1 ]
Chen, Zhenzhong [1 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan 430079, Peoples R China
[2] Nanjing Univ Informat Sci & Technol, Sch Artificial Intelligence, Sch Future Technol, Nanjing 210044, Peoples R China
[3] Zhongnan Univ Econ & Law, Sch Informat & Safety Engn, Wuhan 430073, Peoples R China
基金
中国国家自然科学基金; 中国博士后科学基金;
关键词
Saliency prediction; Visual attention; Movie viewing; Virtual reality; INTEGRATION; GAZE;
D O I
10.1016/j.jvcir.2024.104095
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, an eye -tracking dataset of movie viewing in the immersive environment is developed, which contains 256 movie clips with 2K QHD resolution and corresponding movie genre labels from IMDb (Internet Movie Database). The dataset provides the audio-visual clues for studying the human visual attention when watching movie using a VR headset, by recording the eye movements using integrated eye tracker. To provide benchmarks for a saliency prediction for movie viewing in the immersive environment, fifteen computational models are evaluated on the dataset, including a newly developed multi -stream audio-visual saliency prediction model based on deep neural networks, named as MSAV. Detailed quantitative and qualitative comparisons and analyses are also provided. The developed dataset and benchmarks could help to facilitate the studies of visual saliency prediction for movie viewing in the immersive environments.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] An audio-visual saliency model for movie summarization
    Rapantzikos, Konstantinos
    Evangelopoulos, Georgios
    Maragos, Petros
    Avrithis, Yannis
    [J]. 2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2007, : 320 - 323
  • [2] Audio-visual saliency prediction with multisensory perception and integration
    Xie, Jiawei
    Liu, Zhi
    Li, Gongyang
    Song, Yingjie
    [J]. IMAGE AND VISION COMPUTING, 2024, 143
  • [3] Does Audio help in deep Audio-Visual Saliency prediction models?
    Agrawal, Ritvik
    Jyoti, Shreyank
    Girmaji, Rohit
    Sivaprasad, Sarath
    Gandhi, Vineet
    [J]. PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 48 - 56
  • [4] Towards Audio-Visual Saliency Prediction for Omnidirectional Video with Spatial Audio
    Chao, Fang-Yi
    Ozcinar, Cagri
    Zhang, Lu
    Hamidouche, Wassim
    Deforges, Olivier
    Smolic, Aljosa
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 355 - 358
  • [5] ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
    Jain, Samyak
    Yarlagadda, Pradeep
    Jyoti, Shreyank
    Karthik, Shyamgopal
    Subramanian, Ramanathan
    Gandhi, Vineet
    [J]. 2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3520 - 3527
  • [6] Saliency Prediction in Uncategorized Videos Based on Audio-Visual Correlation
    Qamar, Maryam
    Qamar, Suleman
    Muneeb, Muhammad
    Bae, Sung-Ho
    Rahman, Anis
    [J]. IEEE ACCESS, 2023, 11 : 15460 - 15470
  • [7] Audio-visual collaborative representation learning for Dynamic Saliency Prediction
    Ning, Hailong
    Zhao, Bin
    Hu, Zhanxuan
    He, Lang
    Pei, Ercheng
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 256
  • [8] A manually denoised audio-visual movie watching fMRI dataset for the studyforrest project
    Xingyu Liu
    Zonglei Zhen
    Anmin Yang
    Haohao Bai
    Jia Liu
    [J]. Scientific Data, 6
  • [9] A manually denoised audio-visual movie watching fMRI dataset for the studyforrest project
    Liu, Xingyu
    Zhen, Zonglei
    Yang, Anmin
    Bai, Haohao
    Liu, Jia
    [J]. SCIENTIFIC DATA, 2019, 6 (1)
  • [10] SURROUND SOUND IN AN IMMERSIVE AUDIO-VISUAL ENVIRONMENTS Proposals in the educational field
    Pueo, Basilio
    Sanchez Cid, Manuel
    [J]. REVISTA ICONO 14-REVISTA CIENTIFICA DE COMUNICACION Y TECNOLOGIAS, 2011, 9 (02): : 167 - 184