Editing like Humans: A Contextual, Multimodal Framework for Automated Video Editing

被引:3
|
作者
Koorathota, Sharath [1 ,2 ]
Adelman, Patrick [2 ]
Cotton, Kelly [3 ]
Sajda, Paul [1 ]
机构
[1] Columbia Univ, Dept Biomed Engn, New York, NY 10027 USA
[2] Fovea Inc, New York, NY 10001 USA
[3] CUNY, Grad Ctr, Dept Psychol, New York, NY USA
关键词
D O I
10.1109/CVPRW53098.2021.00186
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an automated video editing model, which we term contextual and multimodal video editing (CMVE). The model leverages visual and textual metadata describing videos, integrating essential information from both modalities, and uses a learned editing style from a single example video to coherently combine clips. The editing model is useful for tasks such as generating news clip montages and highlight reels given a text query that describes the video storyline. The model exploits the perceptual similarity between video frames, objects in videos and text descriptions to emulate coherent video editing. Amazon Mechanical Turk participants made judgements comparing CMVE to expert human editing. Experimental results showed no significant difference in the CMVE vs human edited video in terms of matching the text query and the level of interest each generates, suggesting CMVE is able to effectively integrate semantic information across visual and textual modalities and create perceptually coherent quality videos typical of human video editors. We publicly release an online demonstration of our method.
引用
收藏
页码:1701 / 1709
页数:9
相关论文
共 50 条
  • [1] Automatic Video Editing for Multimodal Meetings
    Kubicek, Radek
    Zak, Pavel
    Zemcik, Pavel
    Herout, Adam
    COMPUTER VISION AND GRAPHICS, 2009, 5337 : 260 - 269
  • [2] Joint Attention for Automated Video Editing
    Wu, Hui-Yin
    Santarra, Trevor
    Leece, Michael
    Vargas, Rolando
    Jhala, Arnav
    PROCEEDINGS OF THE 2020 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2020, 2020, : 55 - 64
  • [3] Automated video tape editing system
    SHIMADA R
    AKATSUKA S
    Toshiba Review, 1971, (62): : 5 - 10
  • [4] An efficient framework for quantum video and video editing
    Wei, Zhanhong
    Sun, Wentao
    Zhu, Shangchao
    Han, Mengdi
    Yin, Huijuan
    INTERNATIONAL JOURNAL OF QUANTUM INFORMATION, 2023, 21 (05)
  • [5] Editing out video editing
    Davis, M
    IEEE MULTIMEDIA, 2003, 10 (02) : 54 - 64
  • [6] Automated Video Editing for Aesthetic Quality Improvement
    Choi, Jun-Ho
    Lee, Jong-Seok
    MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE, 2015, : 1003 - 1006
  • [7] AN AUTOMATED VIDEO-TAPE EDITING SYSTEM
    CAMPBELL, KD
    JOURNAL OF THE SOCIETY OF MOTION PICTURE TELEVISION ENGINEERS, 1970, 79 (03): : 191 - &
  • [8] Generative Methods for Automated Music Video Editing
    Stefan, Julia
    ENTERTAINMENT COMPUTING - ICEC 2014, 2014, 8770 : 226 - 228
  • [9] Ubiquitous interactive video editing via multimodal annotations
    Pimentel, Maria da Graca C.
    Goularte, Rudinei
    Cattelan, Renan G.
    Santos, Felipe S.
    Teixeira, Cesar
    CHANGING TELEVISION ENVIRONMENTS, PROCEEDINGS, 2008, 5066 : 72 - +
  • [10] The application of video semantics and theme representation in automated video editing
    Nack, F
    Parkes, A
    MULTIMEDIA TOOLS AND APPLICATIONS, 1997, 4 (01) : 57 - 83