Location Feature Integration for Clustering-Based Speech Separation in Distributed Microphone Arrays

被引：18

作者：

Souden, Mehrez ^{[1
]}

Kinoshita, Keisuke ^{[2
]}

Delcroix, Marc ^{[2
]}

Nakatani, Tomohiro ^{[2
]}

机构：

[1] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30308 USA

[2] NTT Commun Sci Labs, Kyoto 6190237, Japan

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2014年 / 22卷 / 02期

关键词：

Blind source separation; decision fusion; distributed; microphone array processing; location-based speech clustering; speech enhancement; MULTICHANNEL WIENER FILTER; BLIND SOURCE SEPARATION; EM; ALGORITHMS; MIXTURES;

D O I：

10.1109/TASLP.2013.2292308

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In distributed microphone arrays (DMAs) the source location information can be defined at the intra and inter-node levels. Indeed, while the first type of information results from the diversity of acoustic channels recorded by microphones embedded in the same node, the second is attributed to the differences between the acoustic channels observed by spatially distributed nodes. Both cues are very useful in DMA processing, and the aim of this paper is to utilize both of them to cluster and separate multiple competing speech signals. To capture the intra-node information, we employ the normalized recording vector, while at the inter-node level, we consider different features including the energy level differences with and without the phase differences between nodes. We model the intra-node information using the Watson mixture model (WMM), and propose using the Gamma mixture model (GaMM), Dirichlet mixture model (DMM), and WMM to model different inter-node location features. Furthermore, we propose several integrations of the intra-node and inter-node feature contributions to cluster speech recordings using the expectation maximization algorithm. Finally, simulation results are provided to demonstrate the performance of all ensuing methods.

引用

页码：354 / 367

页数：14

共 50 条

[1] AN INTEGRATION OF SOURCE LOCATION CUES FOR SPEECH CLUSTERING IN DISTRIBUTED MICROPHONE ARRAYS
Souden, Mehrez
Kinoshita, Keisuke
Nakatani, Tomohiro
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 111 - 115
[2] DISTRIBUTED SPEECH SEPARATION IN SPATIALLY UNCONSTRAINED MICROPHONE ARRAYS
Furnon, Nicolas
Serizel, Romain
Illina, Irina
Essid, Slim
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 4490 - 4494
[3] Microphone Clustering and BP Network based Acoustic Source Localization in Distributed Microphone Arrays
Zhang, Qiaoling
Chen, Zhe
Yin, Fuliang
[J]. ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2013, 13 (04) : 33 - 40
[4] MULTICHANNEL FEATURE ENHANCEMENT IN DISTRIBUTED MICROPHONE ARRAYS FOR ROBUST DISTANT SPEECH RECOGNITION IN SMART ROOMS
Mirsamadi, Seyedmahdad
Hansen, John H. L.
[J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 507 - 512
[5] Continuous Speech Separation with Ad Hoc Microphone Arrays
Wang, Dongmei
Yoshioka, Takuya
Chen, Zhuo
Wang, Xiaofei
Zhou, Tianyan
Meng, Zhong
[J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 1100 - 1104
[6] Variational probabilistic speech separation using microphone arrays
Rennie, Steven J.
Aarabi, Parham
Frey, Brendan J.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2007, 15 (01): : 135 - 149
[7] Clustering-based location in wireless networks
Mengual, Luis
Marban, Oscar
Eibe, Santiago
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (09) : 6165 - 6175
[8] Deep clustering-based single-channel speech separation and recent advances
Aihara, Ryo
Wichern, Gordon
Le Roux, Jonathan
[J]. ACOUSTICAL SCIENCE AND TECHNOLOGY, 2020, 41 (02) : 465 - 471
[9] A clustering-based feature selection via feature separability
Jiang, Shengyi
Wang, Lianxi
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2016, 31 (02) : 927 - 937
[10] DNN-BASED DISTRIBUTED MULTICHANNEL MASK ESTIMATION FOR SPEECH ENHANCEMENT IN MICROPHONE ARRAYS
Furnon, Nicolas
Serizel, Romain
Illina, Irina
Essid, Slim
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 4672 - 4676

← 1 2 3 4 5 →