DNN-based Feature Transformation for Speech Recognition Using Throat Microphone

被引：0

作者：

Lin, Shengke ^{[1
]}

Tsunakawa, Takashi ^{[1
]}

Nishida, Masafumi ^{[1
]}

Nishimura, Masafumi ^{[1
]}

机构：

[1] Shizuoka Univ, Grad Sch Integrated Sci & Technol, Shizuoka, Japan

来源：

2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017) | 2017年

关键词：

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we focus on utilizing a throat microphone as noise robust device because its signal is much less affected by surrounding noise than a conventional acoustic microphone signal. However, it can only record narrow frequency bands, and the microphone characteristics are also different from characteristics of acoustic microphone. Therefore, speech recognition performance is greatly degraded when a throat microphone is used as it is instead of a conventional acoustic microphone. To overcome this problem, we propose using a deep neural network (DNN)-based feature transformation method while also using model adaptation. We conducted a continuous digit recognition experiment. The result revealed that the proposed method improved the word error rate (WER) of using the throat microphone from 41.4% to 17.6%.

引用

页码：596 / 599

页数：4

共 50 条

[21] Comparing Fusion Models for DNN-Based Audiovisual Continuous Speech Recognition
Abdelaziz, Ahmed Hussen
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (03) : 475 - 484
[22] DNN-Based Arabic Speech Synthesis
Amrouche, Aissa
Bentrcia, Youssouf
Boubakeur, Khadidja Nesrine
Abed, Ahcene
2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 378 - 382
[23] Knowledge Distillation for Throat Microphone Speech Recognition
Suzuki, Takahito
Ogata, Jun
Tsunakawa, Takashi
Nishida, Masafumi
Nishimura, Masafumi
INTERSPEECH 2019, 2019, : 461 - 465
[24] Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings
Erzin, Engin
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (07): : 1316 - 1324
[25] Effects of microphone mounting location and gender on accuracy in speech recognition using a throat microphone
Konuma, Y.
Asakura, T.
JASA EXPRESS LETTERS, 2023, 3 (09):
[26] DNN-BASED SPEECH PRESENCE PROBABILITY ESTIMATION FORMULTI-FRAME SINGLE-MICROPHONE SPEECH ENHANCEMENT
Tammen, Marvin
Fischer, Doerte
Meyer, Bernd T.
Doclo, Simon
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 191 - 195
[27] An Investigation of DNN-Based Speech Synthesis Using Speaker Codes
Hojo, Nobukatsu
Ijima, Yusuke
Mizuno, Hideyuki
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2278 - 2282
[28] DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus
Yamashita, Yuki
Koriyama, Tomoki
Saito, Yuki
Takamichi, Shinnosuke
Ijima, Yusuke
Masumura, Ryo
Saruwatari, Hiroshi
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6438 - 6443
[29] SPATIAL DIFFUSENESS FEATURES FOR DNN-BASED SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
Schwarz, Andreas
Huemmer, Christian
Maas, Roland
Kellermann, Walter
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4380 - 4384
[30] A DNN-Based Accurate Masking Using Significant Feature Sets
Sivapatham, Shoba
Goel, Pankaj
Burra, Srikanth
Sooraksa, Pitikhate
Kar, Asutosh
2022 20TH INTERNATIONAL CONFERENCE ON ICT AND KNOWLEDGE ENGINEERING (ICT&KE), 2022, : 11 - 16

← 1 2 3 4 5 →