Sound Source Localization Inside a Structure Under Semi-Supervised Conditions

被引:2
|
作者
Kita, Shunsuke [1 ]
Kajikawa, Yoshinobu [2 ]
机构
[1] Osaka Res Inst Ind Sci & Technol, Div Elect & Mech Syst, Osaka 594115, Japan
[2] Kansai Univ, Fac Engn Sci, Osaka 5648680, Japan
关键词
Data models; Adaptation models; Acoustics; Speech processing; Predictive models; Location awareness; Training; Sound source localization; domain transfer; acoustic-structure coupling; t-distributed stochastic neighbor embedding;
D O I
10.1109/TASLP.2023.3263776
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a method for applying a sound source localization (SSL) model trained on simulated data in a real-world environment, with a domain transfer (DT) model for the SSL inside a structure. The DT model transfers real data into pseudo-simulation data. The SSL model trained on the simulation data is then adapted to the real data using the DT model. Our method consists of an SSL model and a DT model. The SSL model predicts the position of a sound source inside the structure, whereas the DT model transforms the data. Because our simulation is not perfect, real data are extrapolated for use with the SSL model. However, the data transformed by the DT model are interpolated within the feature space. The outcome is that the performance of the SSL model in the real world is improved. In our study, the frequency spectra of accelerometers observed on the outer surface of the structure are the model input. The goal is to predict the position of the sound source. The SSL model is built using deep and convolutional neural networks, and the DT model is built using either an autoencoder, a deep convolutional autoencoder, or pix2pix. The two-dimensional distributions of the t-distributed Stochastic Neighbor Embedding indicate that using pix2pix as the DT model shows the best performance. Furthermore, our method's performance for SSL is improved by 57% for the classification problem and by 27% for the regression problem when compared to the case where no transformation is applied.
引用
收藏
页码:1397 / 1408
页数:12
相关论文
共 50 条
  • [31] SemiCurv: Semi-Supervised Curvilinear Structure Segmentation
    Xu, Xun
    Nguyen, Manh Cuong
    Yazici, Yasin
    Lu, Kangkang
    Min, Hlaing
    Foo, Chuan-Sheng
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5109 - 5120
  • [32] Semi-Supervised NMF-CNN for Sound Event Detection
    Chan, Teck Kai
    Chin, Cheng Siong
    Li, Ye
    [J]. IEEE ACCESS, 2021, 9 : 130529 - 130542
  • [33] On Local Temporal Embedding for Semi-Supervised Sound Event Detection
    Gao, Lijian
    Mao, Qirong
    Dong, Ming
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1687 - 1698
  • [34] An Ensemble of Transfer, Semi-supervised and Supervised Learning Methods for Pathological Heart Sound Classification
    Humayun, Ahmed Imtiaz
    Khan, Md. Tauhiduzzaman
    Ghaffarzadegan, Shabnam
    Feng, Zhe
    Hasan, Taufiq
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 127 - 131
  • [35] KFC: An Efficient Framework for Semi-Supervised Temporal Action Localization
    Ding, Xinpeng
    Wang, Nannan
    Gao, Xinbo
    Li, Jie
    Wang, Xiaoyu
    Liu, Tongliang
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6869 - 6878
  • [36] Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
    Guo, Yuxin
    Ma, Shijie
    Su, Hu
    Wang, Zhiqing
    Zhao, Yuhao
    Zou, Wei
    Sun, Siyang
    Zheng, Yun
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [37] Semi-supervised Learning for Segmentation Under Semantic Constraint
    Ganaye, Pierre-Antoine
    Sdika, Michael
    Benoit-Cattin, Hugues
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, PT III, 2018, 11072 : 595 - 602
  • [38] Semi-supervised speaker identification under covariate shift
    Yamada, Makoto
    Sugiyama, Masashi
    Matsui, Tomoko
    [J]. SIGNAL PROCESSING, 2010, 90 (08) : 2353 - 2361
  • [39] Semi-Supervised Learning Under General Causal Models
    Moore, Archer
    Shim, Heejung
    Zhu, Jingge
    Gong, Mingming
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [40] Improving Semi-Supervised Differentiable Synthesizer Sound Matching for Practical Applications
    Masuda, Naotake
    Saito, Daisuke
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 863 - 875