VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

被引:28
|
作者
Wang, Quan [1 ]
Moreno, Ignacio Lopez [1 ]
Saglam, Mert [1 ]
Wilson, Kevin [1 ]
Chiao, Alan [1 ]
Liu, Renjie [1 ]
He, Yanzhang [1 ]
Li, Wei [1 ]
Pelecanos, Jason [1 ]
Nika, Marily [1 ]
Gruenstein, Alexander [1 ]
机构
[1] Google LLC, Mountain View, CA 94043 USA
来源
关键词
source separation; speaker recognition; speech recognition; asymmetric loss; adaptive suppression;
D O I
10.21437/Interspeech.2020-1193
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system. Delivering such a model presents numerous challenges: It should improve the performance when the input signal consists of overlapped speech, and must not hurt the speech recognition performance under all other acoustic conditions. Besides, this model must be tiny, fast, and perform inference in a streaming fashion, in order to have minimal impact on CPU, memory, battery and latency. We propose novel techniques to meet these multi-faceted requirements, including using a new asymmetric loss, and adopting adaptive runtime suppression strength. We also show that such a model can be quantized as a 8-bit integer model and run in realtime.
引用
收藏
页码:2677 / 2681
页数:5
相关论文
共 41 条
  • [1] PUNCTUATION PREDICTION FOR STREAMING ON-DEVICE SPEECH RECOGNITION
    Zhou, Zhikai
    Tan, Tian
    Qian, Yanmin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7277 - 7281
  • [2] Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition
    Li, Wei
    Qin, James
    Chiu, Chung-Cheng
    Pang, Ruoming
    He, Yanzhang
    [J]. INTERSPEECH 2020, 2020, : 2122 - 2126
  • [3] ATTENTION BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH LARGE SPEECH CORPUS
    Kim, Kwangyoun
    Lee, Kyungmin
    Gowda, Dhananjaya
    Park, Junmo
    Kim, Sungsoo
    Jin, Sichen
    Lee, Young-Yoon
    Yeo, Jinsu
    Kim, Daehyun
    Jung, Seokyeong
    Lee, Jungin
    Han, Myoungji
    Kim, Chanwoo
    [J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 956 - 963
  • [4] STREAMING ON-DEVICE DETECTION OF DEVICE DIRECTED SPEECH FROM VOICE AND TOUCH-BASED INVOCATION
    Rudovic, Ognjen
    Bindal, Akanksha
    Garg, Vineet
    Simha, Pramod
    Dighe, Pranay
    Kajarekar, Sachin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 491 - 495
  • [5] On-device Streaming Transformer-based End-to-End Speech Recognition
    Oh, Yoo Rhee
    Park, Kiyoung
    [J]. INTERSPEECH 2021, 2021, : 967 - 968
  • [6] STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION
    Gaur, Yashesh
    Kibre, Nick
    Xue, Jian
    Shu, Kangyuan
    Wang, Yuhui
    Alphanso, Issac
    Li, Jinyu
    Gong, Yifan
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 237 - 244
  • [7] VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
    Wang, Quan
    Muckenhirn, Hannah
    Wilson, Kevin
    Sridhar, Prashant
    Wu, Zelin
    Hershey, John R.
    Saurous, Rif A.
    Weiss, Ron J.
    Jia, Ye
    Moreno, Ignacio Lopez
    [J]. INTERSPEECH 2019, 2019, : 2728 - 2732
  • [8] Garbage Modeling for On-device Speech Recognition
    Van Gysel, Christophe
    Velikovich, Leonid
    McGraw, Ian
    Beaufays, Francoise
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2127 - 2131
  • [9] Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
    Ding, Shaojin
    Rikhye, Rajeev
    Liang, Qiao
    He, Yanzhang
    Wang, Quan
    Narayanan, Arun
    O'Malley, Tom
    McGraw, Ian
    [J]. INTERSPEECH 2022, 2022, : 3744 - 3748
  • [10] CONFORMER-BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH KD COMPRESSION AND TWO-PASS ARCHITECTURE
    Park, Jinhwan
    Jin, Sichen
    Park, Junmo
    Kim, Sungsoo
    Sandhyana, Dhairya
    Lee, Changheon
    Han, Myoungji
    Lee, Jungin
    Jung, Seokyeong
    Han, Changwoo
    Kim, Chanwoo
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 92 - 99