Proposal With Alignment: A Bi-Directional Transformer for 360° Video Viewport Proposal

被引:0
|
作者
Guo, Yichen [1 ]
Xu, Mai [1 ]
Jiang, Lai [1 ]
Deng, Xin [1 ]
Zhou, Jing [2 ]
Chen, Gaoxing [2 ]
Sigal, Leonid [3 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing 100191, Peoples R China
[2] Alibaba Cloud, Hangzhou 310052, Peoples R China
[3] Univ British Columbia, Dept Comp Sci, Vancouver, BC V6T 1Z4, Canada
基金
北京市自然科学基金;
关键词
360 degrees video; viewport proposal; viewport alignment; transformer; PREDICTION; MOVEMENT; HEAD;
D O I
10.1109/TCSVT.2024.3419910
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
People normally watch 360 degrees videos through a head-mounted display, inside which only the content of viewports can be seen. Therefore, viewport proposal, referring to detecting potential viewport candidates, plays an important role in many 360 degrees video processing tasks. In this paper, we advance the viewport proposal by further aligning the predicted viewports across frames for individual subject. This provides a better methodology and a deeper perspective to learn the human perceptual behaviours on 360 degrees videos. Specifically, we first analyze three 360 degrees video datasets and obtain several findings on human consistency, objectness and motion of viewports. Inspired by these findings, we propose a bi-directional transformer approach, named BiT, for 360 degrees video viewport proposal and alignment. Specifically, BiT is composed of a multi-level residual module, a bi-directional encoder-decoder module and a spherical matching module. This way, the viewports can be well proposed and aligned via considering multi-level, bi-directional and non-local information. Moreover, the aligned viewports by BiT are used to refine the viewports and improve viewport proposal accuracy in return. Finally, we validate that our BiT approach is superior on viewport proposal, compared with the state-of-the-art approaches. Besides, the aligned viewports from BiT is verified to be effective in multiple applications, such as saliency prediction, trajectory prediction and perceptual video compression.
引用
收藏
页码:11423 / 11437
页数:15
相关论文
共 50 条
  • [41] Dual Contrastive Learning and Dual Bi-directional Transformer Encoders for Sequential Recommendations
    Wang, Li-e
    Chang, Hengtong
    Wei, Rongwen
    Li, Xianxian
    Sun, Zhigang
    Li, Yongdong
    Wei, Yi
    Meng, LingHui
    PROCEEDINGS OF THE 2024 27 TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, CSCWD 2024, 2024, : 1388 - 1393
  • [42] Dyna-C: A Topology for a Bi-Directional Solid-State Transformer
    Prasai, Anish
    Chen, Hao
    Divan, Deepak
    2014 TWENTY-NINTH ANNUAL IEEE APPLIED POWER ELECTRONICS CONFERENCE AND EXPOSITION (APEC), 2014, : 1219 - 1226
  • [43] Bi-directional Encoder Representation of Transformer model for Sequential Music Recommender System
    Yadav, Naina
    Singh, Anil Kumar
    PROCEEDINGS OF THE 12TH ANNUAL MEETING OF THE FORUM FOR INFORMATION RETRIEVAL EVALUATION (FIRE 2020), 2020, : 49 - 53
  • [44] An FPGA based Accelerator of the Bi-directional Wavefront Algorithm for Pairwise Sequence Alignment
    Ajay, S.
    Praveen, V. S.
    Varghese, Kuruvilla
    2023 IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS, APCCAS, 2024, : 40 - 44
  • [45] Permanent bistable twisted nematic displays using bi-directional alignment surface
    Li, Yuet-Wing
    Lee, Chung Yung
    Kwok, Hoi Sing
    2008 SID INTERNATIONAL SYMPOSIUM, DIGEST OF TECHNICAL PAPERS, VOL XXXIX, BOOKS I-III, 2008, 39 : 1026 - 1029
  • [46] Coupling efficiency of an alignment-tolerant, single fiber, bi-directional link
    Wang, SC
    Cross, J
    Chai, SM
    Lopez, A
    Park, J
    Ingram, MA
    Jokerst, NM
    Wills, DS
    Brooke, M
    Brown, A
    47TH ELECTRONIC COMPONENTS & TECHNOLOGY CONFERENCE, 1997 PROCEEDINGS, 1997, : 30 - 36
  • [47] MULTI-SCALE DEFORMABLE ALIGNMENT AND CONTENT-ADAPTIVE INFERENCE FOR FLEXIBLE-RATE BI-DIRECTIONAL VIDEO COMPRESSION
    Yilmaz, M. Akin
    Ulas, O. Ugur
    Tekalp, A. Murat
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 2475 - 2479
  • [48] A proposal for the classification of immersive journalism genres based on the use of virtual reality and 360° video
    Paino-Ambrosio, Adriana
    Rodriguez-Fidalgo, Maria-Isabel
    REVISTA LATINA DE COMUNICACION SOCIAL, 2019, 74 : 1132 - 1153
  • [49] Complementary Bi-directional Feature Compression for Indoor 360° Semantic Segmentation with Self-distillation
    Zheng, Zishuo
    Lin, Chunyu
    Nie, Lang
    Liao, Kang
    Shen, Zhijie
    Zhao, Yao
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4490 - 4499
  • [50] Bi-Directional Self-Attention with Relative Positional Encoding for Video Summarization
    Lin, Jingxu
    Zhong, Sheng-hua
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 1161 - 1166