ATTENTION-BASED SCALING ADAPTATION FOR TARGET SPEECH EXTRACTION

被引:5
|
作者
Han, Jiangyu [1 ,2 ]
Rao, Wei [2 ]
Long, Yanhua [1 ]
Liang, Jiaen [3 ]
机构
[1] Shanghai Normal Univ, Shanghai, Peoples R China
[2] Tencent Corp, Tencent Ethereal Audio Lab, Shenzhen, Peoples R China
[3] Unisound AI Technol Co Ltd, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Target speech extraction; time-domain; attention; adaptation;
D O I
10.1109/ASRU51503.2021.9687903
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The target speech extraction has attracted widespread attention in recent years. In this work, we focus on investigating the dynamic interaction between different mixtures and the target speaker to exploit the discriminative target speaker clues. We propose a special attention mechanism without introducing any additional parameters in a scaling adaptation layer to better adapt the network towards extracting the target speech. Furthermore, by introducing a mixture embedding matrix pooling method, our proposed attention-based scaling adaptation (ASA) can exploit the target speaker clues in a more efficient way. Experimental results on the spatialized reverberant WSJO 2-mix dataset demonstrate that the proposed method can improve the performance of the target speech extraction effectively. Furthermore, we find that under the same network configurations, the ASA in a single-channel condition can achieve competitive performance gains as that achieved from two-channel mixtures with inter-microphone phase difference (IPD) features.
引用
收藏
页码:658 / 662
页数:5
相关论文
共 50 条
  • [1] Attention-based Contextual Language Model Adaptation for Speech Recognition
    Martinez, Richard Diehl
    Novotney, Scott
    Bulyko, Ivan
    Rastrow, Ariya
    Stolcke, Andreas
    Gandhe, Ankur
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1994 - 2003
  • [2] Contrastive Learning for Target Speaker Extraction With Attention-Based Fusion
    Li, Xiao
    Liu, Ruirui
    Huang, Huichou
    Wu, Qingyao
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 178 - 188
  • [3] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
    Meng, Zhong
    Gaur, Yashesh
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2019, 2019, : 241 - 245
  • [4] ATTENTION-BASED GATED SCALING ADAPTIVE ACOUSTIC MODEL FOR CTC-BASED SPEECH RECOGNITION
    Ding, Fenglin
    Guo, Wu
    Dai, Lirong
    Du, Jun
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7404 - 7408
  • [5] Attention-Based Models for Speech Recognition
    Chorowski, Jan
    Bahdanau, Dzmitry
    Serdyuk, Dmitriy
    Cho, Kyunghyun
    Bengio, Yoshua
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [6] An Online Attention-Based Model for Speech Recognition
    Fan, Ruchao
    Zhou, Pan
    Chen, Wei
    Jia, Jia
    Liu, Gang
    [J]. INTERSPEECH 2019, 2019, : 4390 - 4394
  • [7] Attention-Based Template Adaptation for Face Verification
    Dong, Bin
    An, Zhanfu
    Lin, Jian
    Deng, Weihong
    [J]. 2017 12TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2017), 2017, : 941 - 946
  • [8] ATTENTION-BASED ADVERSARIAL PARTIAL DOMAIN ADAPTATION
    Wang, Mengzhu
    An, Shan
    Luo, Xiao
    Peng, Xiong
    Yu, Wei
    Chen, Junyang
    Luo, Zhigang
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3144 - 3148
  • [9] Linguistic attention-based model for aspect extraction
    Ji, Yunjie
    Li, Jie
    Yu, Yanhua
    [J]. 2018 INTERNATIONAL CONFERENCE ON IMAGE AND VIDEO PROCESSING, AND ARTIFICIAL INTELLIGENCE, 2018, 10836
  • [10] Siamese Attention-Based LSTM for Speech Emotion Recognition
    Nizamidin, Tashpolat
    Zhao, Li
    Liang, Ruiyu
    Xie, Yue
    Hamdulla, Askar
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2020, E103A (07) : 937 - 941