Proposal With Alignment: A Bi-Directional Transformer for 360° Video Viewport Proposal

被引:0
|
作者
Guo, Yichen [1 ]
Xu, Mai [1 ]
Jiang, Lai [1 ]
Deng, Xin [1 ]
Zhou, Jing [2 ]
Chen, Gaoxing [2 ]
Sigal, Leonid [3 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing 100191, Peoples R China
[2] Alibaba Cloud, Hangzhou 310052, Peoples R China
[3] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
基金
北京市自然科学基金;
关键词
360 degrees video; viewport proposal; viewport alignment; transformer; PREDICTION; MOVEMENT; HEAD;
D O I
10.1109/TCSVT.2024.3419910
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
People normally watch 360 degrees videos through a head-mounted display, inside which only the content of viewports can be seen. Therefore, viewport proposal, referring to detecting potential viewport candidates, plays an important role in many 360 degrees video processing tasks. In this paper, we advance the viewport proposal by further aligning the predicted viewports across frames for individual subject. This provides a better methodology and a deeper perspective to learn the human perceptual behaviours on 360 degrees videos. Specifically, we first analyze three 360 degrees video datasets and obtain several findings on human consistency, objectness and motion of viewports. Inspired by these findings, we propose a bi-directional transformer approach, named BiT, for 360 degrees video viewport proposal and alignment. Specifically, BiT is composed of a multi-level residual module, a bi-directional encoder-decoder module and a spherical matching module. This way, the viewports can be well proposed and aligned via considering multi-level, bi-directional and non-local information. Moreover, the aligned viewports by BiT are used to refine the viewports and improve viewport proposal accuracy in return. Finally, we validate that our BiT approach is superior on viewport proposal, compared with the state-of-the-art approaches. Besides, the aligned viewports from BiT is verified to be effective in multiple applications, such as saliency prediction, trajectory prediction and perceptual video compression.
引用
收藏
页码:11423 / 11437
页数:15
相关论文
共 50 条
  • [21] Action-guided CycleGAN for Bi-directional Video Prediction
    Verma, Amit
    Meenpal, Toshanlal
    Acharya, Bibhudendra
    IETE TECHNICAL REVIEW, 2024, 41 (05) : 522 - 536
  • [22] Proposal of Channel Estimation Method for Bi-directional OFDM Based ANC in Higher Time-varying Fading Channel
    Mata, Tanairat
    Naito, Katsuhiro
    Mori, Kazuo
    Kobayashi, Hideo
    Boonsrimuang, Pisit
    2014 IEEE 79TH VEHICULAR TECHNOLOGY CONFERENCE (VTC-SPRING), 2014,
  • [23] Zero-voltage switching bi-directional power electronic transformer
    Sabahi, M.
    Hosseini, S. H.
    Sharifian, M. B.
    Goharrizi, A. Y.
    Gharehpetian, G. B.
    IET POWER ELECTRONICS, 2010, 3 (05) : 818 - 828
  • [24] Bi-directional Distribution Alignment for Transductive Zero-Shot Learning
    Wang, Zhicai
    Hao, Yanbin
    Mu, Tingting
    Li, Ouxiang
    Wang, Shuo
    He, Xiangnan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 19893 - 19902
  • [25] BDLA: Bi-directional local alignment for few-shot learning
    Zijun Zheng
    Xiang Feng
    Huiqun Yu
    Xiuquan Li
    Mengqi Gao
    Applied Intelligence, 2023, 53 : 769 - 785
  • [26] High Throughput Short Read Alignment via Bi-directional BWT
    Lam, T. W.
    Li, Ruiqiang
    Tam, Alan
    Wong, Simon
    Wu, Edward
    Yiu, S. M.
    2009 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2009, : 31 - +
  • [27] Learning a Reversible Embedding Mapping using Bi-Directional Manifold Alignment
    Ganesan, Ashwinkumar
    Ferraro, Francis
    Oates, Tim
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3132 - 3139
  • [28] BDLA: Bi-directional local alignment for few-shot learning
    Zheng, Zijun
    Feng, Xiang
    Yu, Huiqun
    Li, Xiuquan
    Gao, Mengqi
    APPLIED INTELLIGENCE, 2023, 53 (01) : 769 - 785
  • [29] Research proposal: Design, development and evaluation of adaptative and interactive solutions for high-quality viewport-aware VR360 video processing and delivery
    Fernandez-Dasi, Miguel
    Montagud, Mario
    Paradells Aspas, Josep
    PROCEEDINGS OF THE 13TH ACM MULTIMEDIA SYSTEMS CONFERENCE, MMSYS 2022, 2022, : 367 - 371
  • [30] Transformer-based Long-Term Viewport Prediction in 360° Video: Scanpath is All You Need
    Chao, Fang-Yi
    Ozcinar, Cagri
    Smolic, Aljosa
    IEEE MMSP 2021: 2021 IEEE 23RD INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2021,