ExpressEdit: Video Editing with Natural Language and Sketching

被引：0

作者：

Tilekbay, Bekzat ^{[1
]}

Yang, Saelyne ^{[1
]}

Lewkowicz, Michal ^{[2
]}

Suryapranata, Alex ^{[1
]}

Kim, Juho ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea

[2] Yale Univ, Dept Comp Sci, POB 2158, New Haven, CT 06520 USA

来源：

PROCEEDINGS OF 2024 29TH ANNUAL CONFERENCE ON INTELLIGENT USER INTERFACES, IUI 2024 | 2024年

关键词：

video editing; human-AI interaction; multimodal input;

D O I：

10.1145/3640543.3645164

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Informational videos serve as a crucial source for explaining conceptual and procedural knowledge to novices and experts alike. When producing informational videos, editors edit videos by overlaying text/images or trimming footage to enhance the video quality and make it more engaging. However, video editing can be difficult and time-consuming, especially for novice video editors who often struggle with expressing and implementing their editing ideas. To address this challenge, we first explored how multimodality-natural language (NL) and sketching, which are natural modalities humans use for expression-can be utilized to support video editors in expressing video editing ideas. We gathered 176 multimodal expressions of editing commands from 10 video editors, which revealed the patterns of use of NL and sketching in describing edit intents. Based on the findings, we present ExpressEdit, a system that enables editing videos via NL text and sketching on the video frame. Powered by LLM and vision models, the system interprets (1) temporal, (2) spatial, and (3) operational references in an NL command and spatial references from sketching. The system implements the interpreted edits, which then the user can iterate on. An observational study (N=10) showed that ExpressEdit enhanced the ability of novice video editors to express and implement their edit ideas. The system allowed participants to perform edits more efficiently and generate more ideas by generating edits based on user's multimodal edit commands and supporting iterations on the editing commands. This work offers insights into the design of future multimodal interfaces and AI-based pipelines for video editing.

引用

下载

页码：515 / 536

页数：22

共 50 条

[11] Natural language querying for video databases
Erozel, Guzen
Cicekli, Nihan Kesim
Cicekli, Ilyas
INFORMATION SCIENCES, 2008, 178 (12) : 2534 - 2552
[12] Combining Sketching and Traditional Diagram Editing Tools
Stapleton, Gem
Plimmer, Beryl
Delaney, Aidan
Rodgers, Peter
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2015, 6 (01)
[13] Editing out video editing
Davis, M
IEEE MULTIMEDIA, 2003, 10 (02) : 54 - 64
[14] Gesture language use in natural UI: Pen-based sketching in conceptual design
Ma, CX
Dai, GZ
THIRD INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND ITS APPLICATION IN INDUSTRY, 2003, 4756 : 122 - 128
[15] A Collaborative Video Sketching Model in the Making
Gundersen, Peter
Orngreen, Rikke
Henningsen, Birgitte
Hautopp, Heidi
INTERACTIVITY, GAME CREATION, DESIGN, LEARNING, AND INNOVATION, 2018, 229 : 520 - 529
[16] The Role of the Input in Natural Language Video Description
Cascianelli, Silvia
Costante, Gabriele
Devo, Alessandro
Ciarfuglia, Thomas A.
Valigi, Paolo
Fravolini, Mario L.
IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (01) : 271 - 283
[17] Translating Video Content to Natural Language Descriptions
Rohrbach, Marcus
Qiu, Wei
Titov, Ivan
Thater, Stefan
Pinkal, Manfred
Schiele, Bernt
2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 433 - 440
[18] Technical Perspective: From Sketching to Natural Language: Expressive Visual Querying for Accelerating Insight
Howe, Bill
SIGMOD RECORD, 2021, 50 (01) : 50 - 50
[19] An online sketching and gesture editing system for conceptual design
Fang, Guisheng
He, Lili
Kong, Fansheng
Li, Zengfang
7TH INTERNATIONAL CONFERENCE ON COMPUTER-AIDED INDUSTRIAL DESIGN & CONCEPTUAL DESIGN, 2006, : 193 - 198
[20] Editing the topology of 3D models by sketching
Ju, Tao
Zhou, Qian-Yi
Hu, Shi-Min
ACM TRANSACTIONS ON GRAPHICS, 2007, 26 (03):

← 1 2 3 4 5 →