Attention-Based Deep Reinforcement Learning for Virtual Cinematography of 360° Videos

被引:5
|
作者
Wang, Jianyi [1 ]
Xu, Mai [1 ]
Jiang, Lai [1 ]
Song, Yuhang [2 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing 100191, Peoples R China
[2] Univ Oxford, Somerville Coll, Dept Comp Sci, Oxford OX2 6HD, England
基金
北京市自然科学基金;
关键词
360 degrees video; attention; deep reinforcement learning; SALIENCY PREDICTION; MODEL; IMAGES; HEAD; EYE; 2D;
D O I
10.1109/TMM.2020.3021984
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Virtual cinematography refers to automatically selecting a natural-looking normal field-of-view (NFOV) from an entire 360 degrees video. In fact, virtual cinematography can be modeled as a deep reinforcement learning (DRL) problem, in which an agent makes actions related to NFOV selection according to the environment of 360 degrees video frames. More importantly, we find from our data analysis that the selected NFOVs attract significantly more attention than other regions, i.e., the NFOVs have high saliency. Therefore, in this paper, we propose an attention based DRL (A-DRL) approach for virtual cinematography in 360 degrees video. Specifically, we develop a new DRL framework for automatic NFOV selection with the input of both the content, and saliency map of each 360 degrees frame. Then, we propose a new reward function for the DRL framework in our approach, which considers the saliency values, ground-truth, and smooth transition for NFOV selection. Subsequently, a simplified DenseNet (called Mini-DenseNet) is designed to learn the optimal policy via maximizing the reward. Based on the learned policy, the actions of NFOV can be made in our A-DRL approach for virtual cinematography of 360 degrees video. Extensive experiments show that our A-DRL approach outperforms other state-of-the-art virtual cinematography methods, over the datasets of Sports-360 video, and Pano2Vid.
引用
收藏
页码:3227 / 3238
页数:12
相关论文
共 50 条
  • [31] Attention-Based Super Resolution from Videos
    Vaka, Dileep
    Narayanan, P. J.
    Jawahar, C. V.
    [J]. SIXTH INDIAN CONFERENCE ON COMPUTER VISION, GRAPHICS & IMAGE PROCESSING ICVGIP 2008, 2008, : 406 - 412
  • [32] Multiobject Tracking in Videos Based on LSTM and Deep Reinforcement Learning
    Jiang, Ming-xin
    Deng, Chao
    Pan, Zhi-geng
    Wang, Lan-fang
    Sun, Xing
    [J]. COMPLEXITY, 2018,
  • [33] Multi-Task Reinforcement Learning With Attention-Based Mixture of Experts
    Cheng, Guangran
    Dong, Lu
    Cai, Wenzhe
    Sun, Changyin
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (06) : 3811 - 3818
  • [34] FRAMU: Attention-Based Machine Unlearning Using Federated Reinforcement Learning
    Shaik, Thanveer
    Tao, Xiaohui
    Li, Lin
    Xie, Haoran
    Cai, Taotao
    Zhu, Xiaofeng
    Li, Qing
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (10) : 5153 - 5167
  • [35] Attention-Based Distributional Reinforcement Learning for Safe and Efficient Autonomous Driving
    Liu, Jia
    Yin, Jianwen
    Jiang, Zhengmin
    Liang, Qingyi
    Li, Huiyun
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (09): : 7477 - 7484
  • [36] Performance Optimization for Semantic Communications: An Attention-Based Reinforcement Learning Approach
    Wang, Yining
    Chen, Mingzhe
    Luo, Tao
    Saad, Walid
    Niyato, Dusit
    Poor, H. Vincent
    Cui, Shuguang
    [J]. IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2022, 40 (09) : 2598 - 2613
  • [37] Attention-based Partial Decoupling of Policy and Value for Generalization in Reinforcement Learning
    Nafi, Nasik Muhammad
    Glasscock, Creighton
    Hsu, William
    [J]. 2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 15 - 22
  • [38] Deep 360 Pilot: Learning a Deep Agent for Piloting through 360° Sports Videos
    Hu, Hou-Ning
    Lin, Yen-Chen
    Liu, Ming-Yu
    Cheng, Hsien-Tzu
    Chang, Yung-Ju
    Sun, Min
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1396 - 1405
  • [39] Adaptive Streaming of 360-Degree Videos with Reinforcement Learning
    Park, Sohee
    Hoai, Minh
    Bhattacharya, Arani
    Das, Samir R.
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1838 - 1847
  • [40] Generalized attention-based deep multi-instance learning
    Lu Zhao
    Liming Yuan
    Kun Hao
    Xianbin Wen
    [J]. Multimedia Systems, 2023, 29 : 275 - 287