Two-Microphone End-to-End Speaker Joint Identification and Localization Via Convolutional Neural Networks

被引：1

作者：

Salvati, Daniele ^{[1
]}

Drioli, Carlo ^{[1
]}

Foresti, Gian Luca ^{[1
]}

机构：

[1] Univ Udine, Dept Math Comp Sci & Phys, Udine, Italy

来源：

2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2020年

关键词：

Convolutional neural network; end-to-end system; raw waveform; speaker identification; speaker localization; two-microphone array; ACOUSTIC SOURCE LOCALIZATION; NOISY;

D O I：

10.1109/ijcnn48605.2020.9206674

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present an end-to-end scheme based on convolutional neural networks (CNNs) for speaker joint identification and localization. We investigate the possibility to estimate both the direction of arrival (DOA) and the identity of the speaker in far-field noisy and reverberant conditions using a two-channel microphone array. The proposed CNN network is designed to map the raw waveform of the two channels into the speaker identity and into the DOA of its speech signal. We analyze the identification and localization performance with simulated experiments in noisy and reverberation conditions.

引用

页数：6

共 50 条

[31] Towards end-to-end likelihood-free inference with convolutional neural networks
Radev, Stefan T.
Mertens, Ulf K.
Voss, Andreas
Koethe, Ullrich
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2020, 73 (01): : 23 - 43
[32] A new end-to-end image compression system based on convolutional neural networks
Akyazi, Pinar
Ebrahimi, Touradj
APPLICATIONS OF DIGITAL IMAGE PROCESSING XLII, 2019, 11137
[33] An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks
Chen, Jun-Cheng
Ranjan, Rajeev
Kumar, Amit
Chen, Ching-Hui
Patel, Vishal M.
Chellappa, Rama
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 360 - 368
[34] Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection
Dinkel, Heinrich
Qian, Yanmin
Yu, Kai
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) : 2002 - 2014
[35] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
Fujita, Yusuke
Ogawa, Tetsuji
Kobayashi, Tetsunori
IEEE ACCESS, 2023, 11 : 140069 - 140076
[36] Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances
Zeng, Chang
Miao, Xiaoxiao
Wang, Xin
Cooper, Erica
Yamagishi, Junichi
COMPUTER SPEECH AND LANGUAGE, 2024, 86
[37] Two-microphone multi-speaker localization based on a Laplacian Mixture Model
Cobos, Maximo
Lopez, Jose J.
Martinez, David
DIGITAL SIGNAL PROCESSING, 2011, 21 (01) : 66 - 76
[38] End-to-End Partial Discharge Detection in Power Cables via Time-Domain Convolutional Neural Networks
Mohammad Azam Khan
Jaegul Choo
Yong-Hwa Kim
Journal of Electrical Engineering & Technology, 2019, 14 : 1299 - 1309
[39] End-to-End Partial Discharge Detection in Power Cables via Time-Domain Convolutional Neural Networks
Khan, Mohammad Azam
Choo, Jaegul
Kim, Yong-Hwa
JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2019, 14 (03) : 1299 - 1309
[40] Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions
Egiazarov, Alexander
Zennaro, Fabio Massimo
Mavroeidis, Vasileios
2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 1796 - 1804

← 1 2 3 4 5 →