Iterative Text-Based Editing of Talking-Heads Using Neural Retargeting

被引:10
|
作者
Yao, Xinwei [1 ]
Fried, Ohad [2 ,3 ]
Fatahalian, Kayvon [1 ]
Agrawala, Maneesh [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, 353 Jane Stanford Way, Stanford, CA 94305 USA
[2] Interdisciplinary Ctr Herzliya, Herzliyya, Israel
[3] IDC Herzliya, Efi Arazi Sch Comp Sci, IL-46150 Herzliyya, Israel
来源
ACM TRANSACTIONS ON GRAPHICS | 2021年 / 40卷 / 03期
基金
美国国家科学基金会;
关键词
Text-based video editing; talking-heads; phonemes; retargeting;
D O I
10.1145/3449063
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We present a text-based tool for editing talking-head video that enables an iterative editing workflow. On each iteration users can edit the wording of the speech, further refine mouth motions if necessary to reduce artifacts, and manipulate non-verbal aspects of the performance by inserting mouth gestures (e.g., a smile) or changing the overall performance style (e.g., energetic, mumble). Our tool requires only 2 to 3 minutes of the target actor video and it synthesizes the video for each iteration in about 40 seconds, allowing users to quickly explore many editing possibilities as they iterate. Our approach is based on two key ideas. (1) We develop a fast phoneme search algorithm that can quickly identify phoneme-level subsequences of the source repository video that best match a desired edit. This enables our fast iteration loop. (2) We leverage a large repository of video of a source actor and develop a new self-supervised neural retargeting technique for transferring the mouth motions of the source actor to the target actor. This allows us to work with relatively short target actor videos, making our approach applicable inmany real-world editing scenarios. Finally, our, refinement and performance controls give users the ability to further fine-tune the synthesized results.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Text-based Editing of Talking-head Video
    Fried, Ohad
    Tewari, Ayush
    Zollhofer, Michael
    Finkelstein, Adam
    Shechtman, Eli
    Goldman, Dan B.
    Genova, Kyle
    Jin, Zeyu
    Theobalt, Christian
    Agrawala, Maneesh
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (04):
  • [2] Talking-heads attention-based knowledge representation for link prediction
    Wang, Shirui
    Zhou, Wen'an
    Zhou, Qiang
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [3] Non-Linear Editing of Text-Based Screencasts
    Park, Jungkook
    Park, Yeong Hoon
    Oh, Alice
    [J]. UIST 2018: PROCEEDINGS OF THE 31ST ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, 2018, : 403 - 410
  • [4] TrojanEdit: Backdooring Text-Based Image Editing Models
    Guo, Ji
    Chen, Peihong
    Jiang, Wenbo
    Lu, Guoming
    [J]. arXiv,
  • [5] Text-Based Spam Tweets Detection Using Neural Networks
    Mardi, Vanyashree
    Kini, Anvaya
    Sukanya, V. M.
    Rachana, S.
    [J]. ADVANCES IN COMPUTING AND INTELLIGENT SYSTEMS, ICACM 2019, 2020, : 401 - 408
  • [6] Imagic: Text-Based Real Image Editing with Diffusion Models
    Kawar, Bahjat
    Zada, Shiran
    Lang, Oran
    Tov, Omer
    Chang, Huiwen
    Dekel, Tali
    Mosseri, Inbar
    Irani, Michal
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6007 - 6017
  • [7] Multi-modal molecule structure–text model for text-based retrieval and editing
    Shengchao Liu
    Weili Nie
    Chengpeng Wang
    Jiarui Lu
    Zhuoran Qiao
    Ling Liu
    Jian Tang
    Chaowei Xiao
    Animashree Anandkumar
    [J]. Nature Machine Intelligence, 2023, 5 : 1447 - 1457
  • [8] A Frame of Mind: Frame-based vs. Text-based Editing
    Brown, Neil
    Kyfonidis, Charalampos
    Weill-Tessier, Pierre
    Becker, Brett
    Dillane, Joe
    Kolling, Michael
    [J]. UKICER '21: PROCEEDINGS OF THE 2021 UNITED KINGDOM AND IRELAND COMPUTING EDUCATION RESEARCH CONFERENCE, 2021,
  • [9] CONTEXT-AWARE PROSODY CORRECTION FOR TEXT-BASED SPEECH EDITING
    Morrison, Max
    Rencker, Lucas
    Jin, Zeyu
    Bryan, Nicholas J.
    Caceres, Juan-Pablo
    Pardo, Bryan
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7038 - 7042
  • [10] Neural and Linguistic Considerations for Assessing Moral Intuitions Using Text-Based Stimuli
    Bretl, Brandon L.
    [J]. JOURNAL OF PSYCHOLOGY, 2021, 155 (01): : 90 - 114