Proposal With Alignment: A Bi-Directional Transformer for 360° Video Viewport Proposal

被引:0
|
作者
Guo, Yichen [1 ]
Xu, Mai [1 ]
Jiang, Lai [1 ]
Deng, Xin [1 ]
Zhou, Jing [2 ]
Chen, Gaoxing [2 ]
Sigal, Leonid [3 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing 100191, Peoples R China
[2] Alibaba Cloud, Hangzhou 310052, Peoples R China
[3] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
基金
北京市自然科学基金;
关键词
360 degrees video; viewport proposal; viewport alignment; transformer; PREDICTION; MOVEMENT; HEAD;
D O I
10.1109/TCSVT.2024.3419910
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
People normally watch 360 degrees videos through a head-mounted display, inside which only the content of viewports can be seen. Therefore, viewport proposal, referring to detecting potential viewport candidates, plays an important role in many 360 degrees video processing tasks. In this paper, we advance the viewport proposal by further aligning the predicted viewports across frames for individual subject. This provides a better methodology and a deeper perspective to learn the human perceptual behaviours on 360 degrees videos. Specifically, we first analyze three 360 degrees video datasets and obtain several findings on human consistency, objectness and motion of viewports. Inspired by these findings, we propose a bi-directional transformer approach, named BiT, for 360 degrees video viewport proposal and alignment. Specifically, BiT is composed of a multi-level residual module, a bi-directional encoder-decoder module and a spherical matching module. This way, the viewports can be well proposed and aligned via considering multi-level, bi-directional and non-local information. Moreover, the aligned viewports by BiT are used to refine the viewports and improve viewport proposal accuracy in return. Finally, we validate that our BiT approach is superior on viewport proposal, compared with the state-of-the-art approaches. Besides, the aligned viewports from BiT is verified to be effective in multiple applications, such as saliency prediction, trajectory prediction and perceptual video compression.
引用
收藏
页码:11423 / 11437
页数:15
相关论文
共 50 条
  • [1] Viewport Proposal CNN for 360° Video Quality Assessment
    Li, Chen
    Xu, Mai
    Jiang, Lai
    Zhang, Shanyi
    Tao, Xiaoming
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10169 - 10178
  • [2] Bi-Directional Attention Flow for Video Alignment
    Abobeah, Reham
    Torki, Marwan
    Shoukry, Amin
    Katto, Jiro
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2019, : 583 - 589
  • [3] A Modular Bi-Directional Power Electronic Transformer
    Gao, Zhigang
    Fan, Hui
    JOURNAL OF POWER ELECTRONICS, 2016, 16 (02) : 399 - 413
  • [4] Correction of bi-directional effects in video imagery
    Monje, O
    Neale, CMU
    Ahmed, R
    VIDEOGRAPHY AND COLOR PHOTOGRAPHY IN RESOURCE ASSESSMENT, 1997, : 59 - 69
  • [5] Video Alignment Using Bi-Directional Attention Flow in a Multi-Stage Learning Model
    Abobeah, Reham
    Shoukry, Amin
    Katto, Jiro
    IEEE ACCESS, 2020, 8 : 18097 - 18109
  • [6] Meta360: Exploring User-Specific and Robust Viewport Prediction in 360-Degree Videos through Bi-Directional LSTM and Meta-Adaptation
    Li, Junjie
    Wang, Yumei
    Liu, Yu
    2023 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY, ISMAR, 2023, : 652 - 661
  • [7] Proposal for loss reduction and output enhancement of silicon Raman laser using bi-directional pumping scheme
    Huang, Ying
    Shum, P.
    Lin, Chinlon
    OPTICS COMMUNICATIONS, 2010, 283 (07) : 1389 - 1393
  • [8] Learned Bi-Directional Motion Prediction for Video Compression
    Shi, Yunhui
    An, Shaopei
    Wang, Jin
    Yin, Baocai
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA IN ASIA, MMASIA 2022, 2022,
  • [9] Video Saliency Detection Using Bi-directional LSTM
    Chi, Yang
    Li, Jinjiang
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (06): : 2444 - 2463
  • [10] Bi-directional optical flow for future video codec
    Alexander, Alshin
    Elena, Alshina
    2016 DATA COMPRESSION CONFERENCE (DCC), 2016, : 83 - 90