Attention-based neural network for end-to-end music separation

被引:4
|
作者
Wang, Jing [1 ,5 ]
Liu, Hanyue [1 ]
Ying, Haorong [1 ]
Qiu, Chuhan [2 ]
Li, Jingxin [3 ]
Anwar, Muhammad Shahid [4 ,6 ]
机构
[1] Beijing Inst Technol, Beijing, Peoples R China
[2] Commun Univ China, Beijing, Peoples R China
[3] China Elect Standardizat Inst, Beijing, Peoples R China
[4] Gachon Univ, Seongnam, South Korea
[5] Beijing Inst Technol, Beijing 100081, Peoples R China
[6] Gachon Univ, Seongnam 13120, South Korea
关键词
channel attention; densely connected network; end-to-end music separation;
D O I
10.1049/cit2.12163
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The end-to-end separation algorithm with superior performance in the field of speech separation has not been effectively used in music separation. Moreover, since music signals are often dual channel data with a high sampling rate, how to model long-sequence data and make rational use of the relevant information between channels is also an urgent problem to be solved. In order to solve the above problems, the performance of the end-to-end music separation algorithm is enhanced by improving the network structure. Our main contributions include the following: (1) A more reasonable densely connected U-Net is designed to capture the long-term characteristics of music, such as main melody, tone and so on. (2) On this basis, the multi-head attention and dual-path transformer are introduced in the separation module. Channel attention units are applied recursively on the feature map of each layer of the network, enabling the network to perform long-sequence separation. Experimental results show that after the introduction of the channel attention, the performance of the proposed algorithm has a stable improvement compared with the baseline system. On the MUSDB18 dataset, the average score of the separated audio exceeds that of the current best-performing music separation algorithm based on the time-frequency domain (T-F domain).
引用
收藏
页码:355 / 363
页数:9
相关论文
共 50 条
  • [41] INTEGRATING MOTION PRIORS FOR END-TO-END ATTENTION-BASED MULTI-OBJECT TRACKING
    Ali, R.
    Mehltretter, M.
    Heipke, C.
    [J]. GEOSPATIAL WEEK 2023, VOL. 48-1, 2023, : 1619 - 1626
  • [42] UNSUPERVISED SPEAKER ADAPTATION USING ATTENTION-BASED SPEAKER MEMORY FOR END-TO-END ASR
    Sari, Leda
    Moritz, Niko
    Hori, Takaaki
    Le Roux, Jonathan
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7384 - 7388
  • [43] A novel end-to-end deep separation network based on attention mechanism for single channel blind separation in wireless communication
    Ma, Hao
    Zheng, Xiang
    Yu, Lu
    Zhou, Xingyu
    Chen, Yufan
    [J]. IET SIGNAL PROCESSING, 2023, 17 (02)
  • [44] End-to-end neural network based optimal quadcopter control
    Ferede, Robin
    de Croon, Guido
    De Wagter, Christophe
    Izzo, Dario
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2024, 172
  • [45] End-to-End Speech Emotion Recognition Based on Neural Network
    Zhu, Bing
    Zhou, Wenkai
    Wang, Yutian
    Wang, Hui
    Cai, Juan Juan
    [J]. 2017 17TH IEEE INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT 2017), 2017, : 1634 - 1638
  • [46] END-TO-END NEURAL NETWORK BASED AUTOMATED SPEECH SCORING
    Chen, Lei
    Tao, Jidong
    Ghaffarzadegan, Shabnam
    Qian, Yao
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6234 - 6238
  • [47] Sams-Net: A Sliced Attention-based Neural Network for Music Source Separation
    Li, Tingle
    Chen, Jiawei
    Hou, Haowen
    Li, Ming
    [J]. 2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [48] RAttSR: A Novel Low-Cost Reconstructed Attention-Based End-to-End Speech Recognizer
    Bachchu Paul
    Santanu Phadikar
    [J]. Circuits, Systems, and Signal Processing, 2024, 43 : 2454 - 2476
  • [49] Attention-Based Deep Gated Fully Convolutional End-to-End Architectures for Time Series Classification
    Khan, Mehak
    Wang, Hongzhi
    Ngueilbaye, Alladoumbaye
    [J]. NEURAL PROCESSING LETTERS, 2021, 53 (03) : 1995 - 2028
  • [50] Attention-Based Deep Gated Fully Convolutional End-to-End Architectures for Time Series Classification
    Mehak Khan
    Hongzhi Wang
    Alladoumbaye Ngueilbaye
    [J]. Neural Processing Letters, 2021, 53 : 1995 - 2028