Revisiting joint decoding based multi-talker speech recognition with DNN acoustic model

被引：1

作者：

Kocour, Martin ^{[1
]}

Zmolikova, Katerina ^{[1
]}

Ondel, Lucas ^{[1
,3
]}

Svec, Jan ^{[1
]}

Delcroix, Marc ^{[2
]}

Ochiai, Tsubasa ^{[2
]}

Burget, Lukas ^{[1
]}

Cernocky, Jan Honza ^{[1
]}

机构：

[1] Brno Univ Technol, Fac Informat Technol, Speech FIT, Brno, Czech Republic

[2] NTT Corp, Tokyo, Japan

[3] Univ Paris Saclay, LISN, CNRS, Gif Sur Yvette, France

来源：

INTERSPEECH 2022 | 2022年

关键词：

Multi-talker speech recognition; Permutation invariant training; Factorial Hidden Markov models; DEEP NEURAL-NETWORKS;

D O I：

10.21437/Interspeech.2022-10406

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In typical multi-talker speech recognition systems, a neural network-based acoustic model predicts senone state posteriors for each speaker. These are later used by a single-talker decoder which is applied on each speaker-specific output stream separately. In this work, we argue that such a scheme is sub-optimal and propose a principled solution that decodes all speakers jointly. We modify the acoustic model to predict joint state posteriors for all speakers, enabling the network to express uncertainty about the attribution of parts of the speech signal to the speakers. We employ a joint decoder that can make use of this uncertainty together with higher-level language information. For this, we revisit decoding algorithms used in factorial generative models in early multi-talker speech recognition systems. In contrast with these early works, we replace the GMM acoustic model with DNN, which provides greater modeling power and simplifies part of the inference. We demonstrate the advantage of joint decoding in proof of concept experiments on a mixed-TIDIGITS dataset.

引用

页码：4955 / 4959

页数：5

共 50 条

[1] Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Lu, Liang
Kanda, Naoyuki
Li, Jinyu
Gong, Yifan
[J]. INTERSPEECH 2021, 2021, : 1782 - 1786
[2] Real-Time Speech Recognition in a Multi-talker Reverberated Acoustic Scenario
Rotili, Rudy
Principi, Emanuele
Squartini, Stefano
Schuller, Bjoern
[J]. ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS: WITH ASPECTS OF ARTIFICIAL INTELLIGENCE, 2012, 6839 : 379 - +
[3] A Speaker-Dependent Deep Learning Approach to Joint Speech Separation and Acoustic Modeling for Multi-Talker Automatic Speech Recognition
Tu, Yan-Hui
Du, Jun
Dai, Li-Rung
Lee, Chin-Hui
[J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
[4] Modeling speech localization, talker identification, and word recognition in a multi-talker setting
Josupeit, Angela
Hohmann, Volker
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (01): : 35 - 54
[5] ACOUSTIC MODELING FOR DISTANT MULTI-TALKER SPEECH RECOGNITION WITH SINGLE- AND MULTI-CHANNEL BRANCHES
Kanda, Naoyuki
Fujita, Yusuke
Horiguchi, Shota
Ikeshita, Rintaro
Nagamatsu, Kenji
Watanabe, Shinji
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6630 - 6634
[6] SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition
Raj, Desh
Povey, Daniel
Khudanpur, Sanjeev
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3800 - 3813
[7] Monaural multi-talker speech recognition using factorial speech processing models
Khademian, Mahdi
Homayounpour, Mohammad Mehdi
[J]. SPEECH COMMUNICATION, 2018, 98 : 1 - 16
[8] END-TO-END MULTI-TALKER OVERLAPPING SPEECH RECOGNITION
Tripathi, Anshuman
Lu, Han
Sak, Hasim
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 6129 - 6133
[9] Streaming End-to-End Multi-Talker Speech Recognition
Lu, Liang
Kanda, Naoyuki
Li, Jinyu
Gong, Yifan
[J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 803 - 807
[10] Variational Loopy Belief Propagation for Multi-talker Speech Recognition
Rennie, Steven J.
Hershey, John R.
Olsen, Peder A.
[J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1367 - 1370

← 1 2 3 4 5 →