VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

被引：28

作者：

Wang, Quan ^{[1
]}

Moreno, Ignacio Lopez ^{[1
]}

Saglam, Mert ^{[1
]}

Wilson, Kevin ^{[1
]}

Chiao, Alan ^{[1
]}

Liu, Renjie ^{[1
]}

He, Yanzhang ^{[1
]}

Li, Wei ^{[1
]}

Pelecanos, Jason ^{[1
]}

Nika, Marily ^{[1
]}

Gruenstein, Alexander ^{[1
]}

机构：

[1] Google LLC, Mountain View, CA 94043 USA

来源：

INTERSPEECH 2020 | 2020年

关键词：

source separation; speaker recognition; speech recognition; asymmetric loss; adaptive suppression;

D O I：

10.21437/Interspeech.2020-1193

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system. Delivering such a model presents numerous challenges: It should improve the performance when the input signal consists of overlapped speech, and must not hurt the speech recognition performance under all other acoustic conditions. Besides, this model must be tiny, fast, and perform inference in a streaming fashion, in order to have minimal impact on CPU, memory, battery and latency. We propose novel techniques to meet these multi-faceted requirements, including using a new asymmetric loss, and adopting adaptive runtime suppression strength. We also show that such a model can be quantized as a 8-bit integer model and run in realtime.

引用

页码：2677 / 2681

页数：5

共 41 条

[1] PUNCTUATION PREDICTION FOR STREAMING ON-DEVICE SPEECH RECOGNITION
Zhou, Zhikai
Tan, Tian
Qian, Yanmin
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7277 - 7281
[2] Parallel Rescoring with Transformer for Streaming On-Device Speech Recognition
Li, Wei
Qin, James
Chiu, Chung-Cheng
Pang, Ruoming
He, Yanzhang
[J]. INTERSPEECH 2020, 2020, : 2122 - 2126
[3] ATTENTION BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH LARGE SPEECH CORPUS
Kim, Kwangyoun
Lee, Kyungmin
Gowda, Dhananjaya
Park, Junmo
Kim, Sungsoo
Jin, Sichen
Lee, Young-Yoon
Yeo, Jinsu
Kim, Daehyun
Jung, Seokyeong
Lee, Jungin
Han, Myoungji
Kim, Chanwoo
[J]. 2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 956 - 963
[4] STREAMING ON-DEVICE DETECTION OF DEVICE DIRECTED SPEECH FROM VOICE AND TOUCH-BASED INVOCATION
Rudovic, Ognjen
Bindal, Akanksha
Garg, Vineet
Simha, Pramod
Dighe, Pranay
Kajarekar, Sachin
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 491 - 495
[5] On-device Streaming Transformer-based End-to-End Speech Recognition
Oh, Yoo Rhee
Park, Kiyoung
[J]. INTERSPEECH 2021, 2021, : 967 - 968
[6] STREAMING, FAST AND ACCURATE ON-DEVICE INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION
Gaur, Yashesh
Kibre, Nick
Xue, Jian
Shu, Kangyuan
Wang, Yuhui
Alphanso, Issac
Li, Jinyu
Gong, Yifan
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 237 - 244
[7] VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Wang, Quan
Muckenhirn, Hannah
Wilson, Kevin
Sridhar, Prashant
Wu, Zelin
Hershey, John R.
Saurous, Rif A.
Weiss, Ron J.
Jia, Ye
Moreno, Ignacio Lopez
[J]. INTERSPEECH 2019, 2019, : 2728 - 2732
[8] Garbage Modeling for On-device Speech Recognition
Van Gysel, Christophe
Velikovich, Leonid
McGraw, Ian
Beaufays, Francoise
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2127 - 2131
[9] Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition
Ding, Shaojin
Rikhye, Rajeev
Liang, Qiao
He, Yanzhang
Wang, Quan
Narayanan, Arun
O'Malley, Tom
McGraw, Ian
[J]. INTERSPEECH 2022, 2022, : 3744 - 3748
[10] CONFORMER-BASED ON-DEVICE STREAMING SPEECH RECOGNITION WITH KD COMPRESSION AND TWO-PASS ARCHITECTURE
Park, Jinhwan
Jin, Sichen
Park, Junmo
Kim, Sungsoo
Sandhyana, Dhairya
Lee, Changheon
Han, Myoungji
Lee, Jungin
Jung, Seokyeong
Han, Changwoo
Kim, Chanwoo
[J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 92 - 99

← 1 2 3 4 5 →