Attention-Based Deep Reinforcement Learning for Virtual Cinematography of 360° Videos

被引：5

作者：

Wang, Jianyi ^{[1
]}

Xu, Mai ^{[1
]}

Jiang, Lai ^{[1
]}

Song, Yuhang ^{[2
]}

机构：

[1] Beihang Univ, Sch Elect & Informat Engn, Beijing 100191, Peoples R China

[2] Univ Oxford, Somerville Coll, Dept Comp Sci, Oxford OX2 6HD, England

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2021年 / 23卷

基金：

北京市自然科学基金;

关键词：

360 degrees video; attention; deep reinforcement learning; SALIENCY PREDICTION; MODEL; IMAGES; HEAD; EYE; 2D;

D O I：

10.1109/TMM.2020.3021984

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Virtual cinematography refers to automatically selecting a natural-looking normal field-of-view (NFOV) from an entire 360 degrees video. In fact, virtual cinematography can be modeled as a deep reinforcement learning (DRL) problem, in which an agent makes actions related to NFOV selection according to the environment of 360 degrees video frames. More importantly, we find from our data analysis that the selected NFOVs attract significantly more attention than other regions, i.e., the NFOVs have high saliency. Therefore, in this paper, we propose an attention based DRL (A-DRL) approach for virtual cinematography in 360 degrees video. Specifically, we develop a new DRL framework for automatic NFOV selection with the input of both the content, and saliency map of each 360 degrees frame. Then, we propose a new reward function for the DRL framework in our approach, which considers the saliency values, ground-truth, and smooth transition for NFOV selection. Subsequently, a simplified DenseNet (called Mini-DenseNet) is designed to learn the optimal policy via maximizing the reward. Based on the learned policy, the actions of NFOV can be made in our A-DRL approach for virtual cinematography of 360 degrees video. Extensive experiments show that our A-DRL approach outperforms other state-of-the-art virtual cinematography methods, over the datasets of Sports-360 video, and Pano2Vid.

引用

页码：3227 / 3238

页数：12

共 50 条

[41] An attention-based deep learning for acute lymphoblastic leukemia classification
Jawahar, Malathy
Anbarasi, L. Jani
Narayanan, Sathiya
Gandomi, Amir H.
[J]. SCIENTIFIC REPORTS, 2024, 14 (01):
[42] Federated deep active learning for attention-based transaction classification
Usman Ahmed
Jerry Chun-Wei Lin
Philippe Fournier-Viger
[J]. Applied Intelligence, 2023, 53 : 8631 - 8643
[43] Generalized attention-based deep multi-instance learning
Zhao, Lu
Yuan, Liming
Hao, Kun
Wen, Xianbin
[J]. MULTIMEDIA SYSTEMS, 2023, 29 (01) : 275 - 287
[44] Attention-based Deep Learning Model for Text Readability Evaluation
Sun, Yuxuan
Chen, Keying
Sun, Lin
Hu, Chenlu
[J]. 2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[45] aDFR: An Attention-Based Deep Learning Model for Flight Ranking
Yi, Yuan
Cao, Jian
Tan, YuDong
Nie, QiangQiang
Lu, XiaoXi
[J]. WEB INFORMATION SYSTEMS ENGINEERING, WISE 2020, PT II, 2020, 12343 : 548 - 562
[46] Multimodal attention-based deep learning for automatic modulation classification
Han, Jia
Yu, Zhiyong
Yang, Jian
[J]. FRONTIERS IN ENERGY RESEARCH, 2023, 10
[47] Mobile traffic prediction with attention-based hybrid deep learning
Wang, Li
Che, Linxiao
Lam, Kwok-Yan
Liu, Wenqiang
Li, Feng
[J]. PHYSICAL COMMUNICATION, 2024, 66
[48] Federated deep active learning for attention-based transaction classification
Ahmed, Usman
Lin, Jerry Chun-Wei
Fournier-Viger, Philippe
[J]. APPLIED INTELLIGENCE, 2023, 53 (08) : 8631 - 8643
[49] Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos
Agethen, Sebastian
Hsu, Winston H.
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (03) : 819 - 829
[50] Crowd-Robot Interaction: Crowd-aware Robot Navigation with Attention-based Deep Reinforcement Learning
Chen, Changan
Liu, Yuejiang
Kreiss, Sven
Alahi, Alexandre
[J]. 2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 6015 - 6022

← 1 2 3 4 5 →