Attention-based neural network for end-to-end music separation

被引:4
|
作者
Wang, Jing [1 ,5 ]
Liu, Hanyue [1 ]
Ying, Haorong [1 ]
Qiu, Chuhan [2 ]
Li, Jingxin [3 ]
Anwar, Muhammad Shahid [4 ,6 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Commun Univ China, Beijing, Peoples R China
[3] China Elect Standardizat Inst, Beijing, Peoples R China
[4] Gachon Univ, Seongnam, South Korea
[5] Beijing Inst Technol, Beijing 100081, Peoples R China
[6] Gachon Univ, Seongnam 13120, South Korea
关键词
channel attention; densely connected network; end-to-end music separation;
D O I
10.1049/cit2.12163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation. Moreover, since music signals are often dual channel data with a high sampling rate, how to model long-sequence data and make rational use of the relevant information between channels is also an urgent problem to be solved. In order to solve the above problems, the performance of the end-to-end music separation algorithm is enhanced by improving the network structure. Our main contributions include the following: (1) A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music, such as main melody, tone and so on. (2) On this basis, the multi-head attention and dual-path transformer are introduced in the separation module. Channel attention units are applied recursively on the feature map of each layer of the network, enabling the network to perform long-sequence separation. Experimental results show that after the introduction of the channel attention, the performance of the proposed algorithm has a stable improvement compared with the baseline system. On the MUSDB18 dataset, the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain (T-F domain).
引用
收藏
页码:355 / 363
页数:9
相关论文
共 50 条
  • [1] Attention-based end-to-end image defogging network
    Yang, Yan
    Zhang, Chen
    Jiang, Peipei
    Yue, Hui
    [J]. ELECTRONICS LETTERS, 2020, 56 (15) : 759 - +
  • [2] Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network
    Ao, Junyi
    Ko, Tom
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [3] An End-to-End Attention-Based Neural Model for Complementary Clothing Matching
    Liu, Jinhuan
    Song, Xuemeng
    Nie, Liqiang
    Gan, Tian
    Ma, Jun
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (04)
  • [4] A-VLAD: An End-to-End Attention-Based Neural Network for Writer Identification in Historical Documents
    Ngo, Trung Tan
    Nguyen, Hung Tuan
    Nakagawa, Masaki
    [J]. DOCUMENT ANALYSIS AND RECOGNITION - ICDAR 2021, PT II, 2021, 12822 : 396 - 409
  • [5] EXPLORING END-TO-END ATTENTION-BASED NEURAL NETWORKS FOR NATIVE LANGUAGE IDENTIFICATION
    Ubale, Rutuja
    Qian, Yao
    Evanini, Keelan
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 84 - 91
  • [6] End-to-end Language Identification using Attention-based Recurrent Neural Networks
    Geng, Wang
    Wang, Wenfu
    Zhao, Yuanyuan
    Cai, Xinyuan
    Xu, Bo
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2944 - 2948
  • [7] Attention-Based Encoder-Decoder End-to-End Neural Diarization With Embedding Enhancer
    Chen, Zhengyang
    Han, Bing
    Wang, Shuai
    Qian, Yanmin
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1636 - 1649
  • [8] End-to-end Answer Selection via Attention-Based Bi-LSTM Network
    Ren, Yuqi
    Zhang, Tongxuan
    Liu, Xikai
    Lin, Hongfei
    [J]. PROCEEDINGS OF 2018 1ST IEEE INTERNATIONAL CONFERENCE ON HOT INFORMATION-CENTRIC NETWORKING (HOTICN 2018), 2018, : 264 - 265
  • [9] END-TO-END ATTENTION-BASED LARGE VOCABULARY SPEECH RECOGNITION
    Bandanau, Dzmitry
    Chorowski, Jan
    Serdyuk, Dmitriy
    Brakel, Philemon
    Bengio, Yoshua
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 4945 - 4949
  • [10] Speaker Adaptation for Attention-Based End-to-End Speech Recognition
    Meng, Zhong
    Gaur, Yashesh
    Li, Jinyu
    Gong, Yifan
    [J]. INTERSPEECH 2019, 2019, : 241 - 245