Object Detection-Based Video Retargeting With Spatial-Temporal Consistency

被引:18
|
作者
Lee, Seung Joon [1 ]
Lee, Siyeong [2 ]
Cho, Sung In [3 ]
Kang, Suk-Ju [1 ]
机构
[1] Sogang Univ, Dept Elect Engn, Seoul 04107, South Korea
[2] NAVER LABS, Seongnam Si 13638, South Korea
[3] Dongguk Univ, Dept Multimedia Engn, Seoul 04620, South Korea
关键词
Object detection; Object tracking; Distortion; Indexes; Computational complexity; Image sequences; Optimization; object tracking; video retargeting; convolutional neural network;
D O I
10.1109/TCSVT.2020.2981652
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This study proposes a video retargeting method using deep neural network-based object detection. First, the meaningful regions of the input video denoted by bounding boxes of the object detection are extracted. In this case, the area is defined considering the size and number of bounding boxes for objects detected. The bounding boxes of each frame image are considered as regions of interest (RoIs). Second, the Siamese object tracking network is used to address high computational complexity of the object detection network. By dividing the video into scenes, object detection is performed for the first frame image of each scene to obtain the first bounding box. Object tracking is performed for the next sequential frame image until a scene change is detected. Third, the image is resized in the horizontal direction to alter the aspect ratio of the image and obtain the 1D RoIs of the image by projecting bounding boxes in the vertical direction. Then, the proposed method computes the grid map from the 1D RoIs to calculate new coordinates of each column data of the image. Finally, the retargeted video is obtained by rearranging all retargeted frame images. Comparative experiments conducted with various benchmark methods show an average bidirectional similarity score of 1.92, which is higher than other conventional methods. The proposed method was stable and satisfied viewers without causing cognitive discomfort as conventional methods.
引用
收藏
页码:4434 / 4439
页数:6
相关论文
共 50 条
  • [1] Video Object Detection with an Aligned Spatial-Temporal Memory
    Xiao, Fanyi
    Lee, Yong Jae
    [J]. COMPUTER VISION - ECCV 2018, PT VIII, 2018, 11212 : 494 - 510
  • [2] VIDEO RETARGETING WITH NONLINEAR SPATIAL-TEMPORAL SALIENCY FUSION
    Lu, Taoran
    Yuan, Zheng
    Huang, Yu
    Wu, Dapeng
    Yu, Heather
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 1801 - 1804
  • [3] SPATIAL-TEMPORAL FEATURE AGGREGATION NETWORK FOR VIDEO OBJECT DETECTION
    Chen, Zhu
    Li, Weihai
    Fei, Chi
    Liu, Bin
    Yu, Nenghai
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 1858 - 1862
  • [4] Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection
    Xu, Chao
    Zhang, Jiangning
    Wang, Mengmeng
    Tian, Guanzhong
    Liu, Yong
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 7809 - 7820
  • [5] Object Detection-Based Video Compression
    Kim, Myung-Jun
    Lee, Yung-Lyul
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (09):
  • [6] Lightweight unmanned aerial vehicle video object detection based on spatial-temporal correlation
    Zhou, Pei
    Liu, GuanJun
    Wang, Jiacun
    Weng, QianLi
    Zhang, KaiWen
    Zhou, ZiYuan
    [J]. INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2022, 35 (17)
  • [7] Model-based approach to spatial-temporal sampling of video clips for video object detection by classification
    Chuang, Chi-Han
    Cheng, Shyi-Chyi
    Chang, Chin-Chun
    Chen, Yi-Ping Phoebe
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (05) : 1018 - 1030
  • [8] Deep Spatial-Temporal Joint Feature Representation for Video Object Detection
    Zhao, Baojun
    Zhao, Boya
    Tang, Linbo
    Han, Yuqi
    Wang, Wenzheng
    [J]. SENSORS, 2018, 18 (03)
  • [9] Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection
    Liu, Nian
    Nan, Kepan
    Zhao, Wangbo
    Yao, Xiwen
    Han, Junwei
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (08) : 10663 - 10673
  • [10] End-to-End Video Object Detection with Spatial-Temporal Transformers
    He, Lu
    Zhou, Qianyu
    Li, Xiangtai
    Niu, Li
    Cheng, Guangliang
    Li, Xiao
    Liu, Wenxuan
    Tong, Yunhai
    Ma, Lizhuang
    Zhang, Liqing
    [J]. PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1507 - 1516