Two-Microphone End-to-End Speaker Joint Identification and Localization Via Convolutional Neural Networks

被引:1
|
作者
Salvati, Daniele [1 ]
Drioli, Carlo [1 ]
Foresti, Gian Luca [1 ]
机构
[1] Univ Udine, Dept Math Comp Sci & Phys, Udine, Italy
关键词
Convolutional neural network; end-to-end system; raw waveform; speaker identification; speaker localization; two-microphone array; ACOUSTIC SOURCE LOCALIZATION; NOISY;
D O I
10.1109/ijcnn48605.2020.9206674
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an end-to-end scheme based on convolutional neural networks (CNNs) for speaker joint identification and localization. We investigate the possibility to estimate both the direction of arrival (DOA) and the identity of the speaker in far-field noisy and reverberant conditions using a two-channel microphone array. The proposed CNN network is designed to map the raw waveform of the two channels into the speaker identity and into the DOA of its speech signal. We analyze the identification and localization performance with simulated experiments in noisy and reverberation conditions.
引用
收藏
页数:6
相关论文
共 50 条
  • [11] End-to-End Chinese Speaker Identification
    Yu, Dian
    Zhou, Ben
    Yu, Dong
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 2274 - 2285
  • [12] An End-to-End Compression Framework Based on Convolutional Neural Networks
    Jiang, Feng
    Tao, Wen
    Liu, Shaohui
    Ren, Jie
    Guo, Xun
    Zhao, Debin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (10) : 3007 - 3018
  • [13] An End-to-End Compression Framework Based on Convolutional Neural Networks
    Tao, Wen
    Jiang, Feng
    Zhang, Shengping
    Ren, Jie
    Shi, Wuzhen
    Zuo, Wangmeng
    Guo, Xun
    Zhao, Debin
    2017 DATA COMPRESSION CONFERENCE (DCC), 2017, : 463 - 463
  • [14] Tied Hidden Factors in Neural Networks for End-to-End Speaker Recognition
    Miguel, Antonio
    Llombart, Jorge
    Ortega, Alfonso
    Lleida, Eduardo
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2819 - 2823
  • [15] END-TO-END DETECTION OF ATTACKS TO AUTOMATIC SPEAKER RECOGNIZERS WITH TIME-ATTENTIVE LIGHT CONVOLUTIONAL NEURAL NETWORKS
    Monteiro, Joao
    Alam, Jahangir
    Falk, Tiago H.
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [16] End-to-End Neural Speaker Diarization with Absolute Speaker Loss
    Wang, Chao
    Li, Jie
    Fang, Xiang
    Kang, Jian
    Li, Yongxiang
    INTERSPEECH 2023, 2023, : 3577 - 3581
  • [17] End-to-end Cooperative Localization via Neural Feature Sharing
    Gao, Letian
    Xiang, Hao
    Xia, Xin
    Ma, Jiaqi
    2024 35TH IEEE INTELLIGENT VEHICLES SYMPOSIUM, IEEE IV 2024, 2024, : 553 - 558
  • [18] Towards End-to-End Speech Recognition with Deep Convolutional Neural Networks
    Zhang, Ying
    Pezeshki, Mohammad
    Brakel, Philemon
    Zhang, Saizheng
    Laurent, Cesar
    Bengio, Yoshua
    Courville, Aaron
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 410 - 414
  • [19] Towards End-to-end Text Spotting with Convolutional Recurrent Neural Networks
    Li, Hui
    Wang, Peng
    Shen, Chunhua
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5248 - 5256
  • [20] Convolutional Dictionary Learning by End-To-End Training of Iterative Neural Networks
    Kofler, Andreas
    Wald, Christian
    Schaeffter, Tobias
    Haltmeier, Markus
    Kolbitsch, Christoph
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1213 - 1217