ExpressEdit: Video Editing with Natural Language and Sketching

被引:0
|
作者
Tilekbay, Bekzat [1 ]
Yang, Saelyne [1 ]
Lewkowicz, Michal [2 ]
Suryapranata, Alex [1 ]
Kim, Juho [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea
[2] Yale Univ, Dept Comp Sci, POB 2158, New Haven, CT 06520 USA
关键词
video editing; human-AI interaction; multimodal input;
D O I
10.1145/3640543.3645164
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Informational videos serve as a crucial source for explaining conceptual and procedural knowledge to novices and experts alike. When producing informational videos, editors edit videos by overlaying text/images or trimming footage to enhance the video quality and make it more engaging. However, video editing can be difficult and time-consuming, especially for novice video editors who often struggle with expressing and implementing their editing ideas. To address this challenge, we first explored how multimodality-natural language (NL) and sketching, which are natural modalities humans use for expression-can be utilized to support video editors in expressing video editing ideas. We gathered 176 multimodal expressions of editing commands from 10 video editors, which revealed the patterns of use of NL and sketching in describing edit intents. Based on the findings, we present ExpressEdit, a system that enables editing videos via NL text and sketching on the video frame. Powered by LLM and vision models, the system interprets (1) temporal, (2) spatial, and (3) operational references in an NL command and spatial references from sketching. The system implements the interpreted edits, which then the user can iterate on. An observational study (N=10) showed that ExpressEdit enhanced the ability of novice video editors to express and implement their edit ideas. The system allowed participants to perform edits more efficiently and generate more ideas by generating edits based on user's multimodal edit commands and supporting iterations on the editing commands. This work offers insights into the design of future multimodal interfaces and AI-based pipelines for video editing.
引用
下载
收藏
页码:515 / 536
页数:22
相关论文
共 50 条
  • [1] ExpressEdit: Video Editing with Natural Language and Sketching
    Tilekbay, Bekzat
    Yang, Saelyne
    Lewkowicz, Michal
    Suryapranata, Alex
    Kim, Juho
    COMPANION PROCEEDINGS OF 2024 29TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2024 COMPANION, 2024, : 50 - 53
  • [2] Sketching Transformed Matrices with Applications to Natural Language Processing
    Liang, Yingyu
    Song, Zhao
    Wang, Mengdi
    Yang, Lin F.
    Yang, Xin
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 467 - 480
  • [3] Iterative Motion Editing with Natural Language
    Goel, Purvi
    Wang, Kuan-Chieh
    Liu, C. Karen
    Fatahalian, Kayvon
    PROCEEDINGS OF SIGGRAPH 2024 CONFERENCE PAPERS, 2024,
  • [5] Sketching Manipulators for Localized Blendshape Editing
    Cetinaslan, Ozan
    Orvalho, Veronica
    GRAPHICAL MODELS, 2020, 108
  • [6] From Sketching to Natural Language: Expressive Visual Querying for Accelerating Insight
    Siddiqui, Tarique
    Luh, Paul
    Wang, Zesheng
    Karahalios, Karrie
    Parameswaran, Aditya G.
    SIGMOD RECORD, 2021, 50 (01) : 51 - 58
  • [7] Natural language driven video sequencer
    Terebijon Gakkaishi, 10 (1585):
  • [8] Natural Language Access to Video Databases
    Francis, Danny
    Pidou, Paul
    Merialdo, Bernard
    Huet, Benoit
    2017 IEEE THIRD INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2017), 2017, : 78 - 81
  • [9] The Learning Potential of Video Sketching
    Orngreen, Rikke
    Henningsen, Birgitte
    Gundersen, Peter
    Hautopp, Heidi
    PROCEEDINGS OF THE 16TH EUROPEAN CONFERENCE ON E-LEARNING (ECEL 2017), 2017, : 422 - 430
  • [10] Localizing Moments in Video with Natural Language
    Hendricks, Lisa Anne
    Wang, Oliver
    Shechtman, Eli
    Sivic, Josef
    Darrell, Trevor
    Russell, Bryan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5804 - 5813