Two-Microphone End-to-End Speaker Joint Identification and Localization Via Convolutional Neural Networks

被引:1
|
作者
Salvati, Daniele [1 ]
Drioli, Carlo [1 ]
Foresti, Gian Luca [1 ]
机构
[1] Univ Udine, Dept Math Comp Sci & Phys, Udine, Italy
关键词
Convolutional neural network; end-to-end system; raw waveform; speaker identification; speaker localization; two-microphone array; ACOUSTIC SOURCE LOCALIZATION; NOISY;
D O I
10.1109/ijcnn48605.2020.9206674
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an end-to-end scheme based on convolutional neural networks (CNNs) for speaker joint identification and localization. We investigate the possibility to estimate both the direction of arrival (DOA) and the identity of the speaker in far-field noisy and reverberant conditions using a two-channel microphone array. The proposed CNN network is designed to map the raw waveform of the two channels into the speaker identity and into the DOA of its speech signal. We analyze the identification and localization performance with simulated experiments in noisy and reverberation conditions.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Towards end-to-end likelihood-free inference with convolutional neural networks
    Radev, Stefan T.
    Mertens, Ulf K.
    Voss, Andreas
    Koethe, Ullrich
    BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2020, 73 (01): : 23 - 43
  • [32] A new end-to-end image compression system based on convolutional neural networks
    Akyazi, Pinar
    Ebrahimi, Touradj
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XLII, 2019, 11137
  • [33] An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks
    Chen, Jun-Cheng
    Ranjan, Rajeev
    Kumar, Amit
    Chen, Ching-Hui
    Patel, Vishal M.
    Chellappa, Rama
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOP (ICCVW), 2015, : 360 - 368
  • [34] Investigating Raw Wave Deep Neural Networks for End-to-End Speaker Spoofing Detection
    Dinkel, Heinrich
    Qian, Yanmin
    Yu, Kai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (11) : 2002 - 2014
  • [35] Self-Conditioning via Intermediate Predictions for End-to-End Neural Speaker Diarization
    Fujita, Yusuke
    Ogawa, Tetsuji
    Kobayashi, Tetsunori
    IEEE ACCESS, 2023, 11 : 140069 - 140076
  • [36] Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances
    Zeng, Chang
    Miao, Xiaoxiao
    Wang, Xin
    Cooper, Erica
    Yamagishi, Junichi
    COMPUTER SPEECH AND LANGUAGE, 2024, 86
  • [37] Two-microphone multi-speaker localization based on a Laplacian Mixture Model
    Cobos, Maximo
    Lopez, Jose J.
    Martinez, David
    DIGITAL SIGNAL PROCESSING, 2011, 21 (01) : 66 - 76
  • [38] End-to-End Partial Discharge Detection in Power Cables via Time-Domain Convolutional Neural Networks
    Mohammad Azam Khan
    Jaegul Choo
    Yong-Hwa Kim
    Journal of Electrical Engineering & Technology, 2019, 14 : 1299 - 1309
  • [39] End-to-End Partial Discharge Detection in Power Cables via Time-Domain Convolutional Neural Networks
    Khan, Mohammad Azam
    Choo, Jaegul
    Kim, Yong-Hwa
    JOURNAL OF ELECTRICAL ENGINEERING & TECHNOLOGY, 2019, 14 (03) : 1299 - 1309
  • [40] Firearm Detection via Convolutional Neural Networks: Comparing a Semantic Segmentation Model Against End-to-End Solutions
    Egiazarov, Alexander
    Zennaro, Fabio Massimo
    Mavroeidis, Vasileios
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 1796 - 1804