AUDIO CODING BASED ON SPECTRAL RECOVERY BY CONVOLUTIONAL NEURAL NETWORK

被引:0
|
作者
Shin, Seong-Hyeon [1 ]
Beack, Seung Kwon [2 ]
Lee, Taejin [2 ]
Park, Hochong [1 ]
机构
[1] Kwangwoon Univ, Seoul, South Korea
[2] Elect & Telecommun Res Inst, Daejeon, South Korea
关键词
audio coding; convolutional neural network; spectral recovery; transform coding; SPEECH;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This study proposes a new method of audio coding based on spectral recovery, which can enhance the performance of transform audio coding. An encoder represents spectral information of an input in a time-frequency domain and transmits only a portion of it so that the remaining spectral information can be recovered based on the transmitted information. A decoder recovers the magnitudes of missing spectral information using a convolutional neural network. The signs of missing spectral information are either transmitted or randomly assigned, according to their importance. By combining transmission and recovery of spectral information, the proposed method can enhance the coding performance, compared with conventional transform coding. The subjective performance evaluation shows that, for mono coding at 39.4 kbps, the proposed method provides higher sound quality than the USAC, by an average MUSHRA score of 8.5.
引用
收藏
页码:725 / 729
页数:5
相关论文
共 50 条
  • [41] Incorporation of a spectral model in a convolutional neural network for accelerated spectral fitting
    Gurbani, Saumya S.
    Sheriff, Sulaiman
    Maudsley, Andrew A.
    Shim, Hyunsuk
    Cooper, Lee A. D.
    [J]. MAGNETIC RESONANCE IN MEDICINE, 2019, 81 (05) : 3346 - 3357
  • [42] Speech Emotion Recognition using Convolutional Neural Network with Audio Word-based Embedding
    Huang, Kun-Yi
    Wu, Chung-Hsien
    Hong, Qian-Bei
    Su, Ming-Hsiang
    Zeng, Yuan-Rong
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 265 - 269
  • [43] SPECTRAL ENVELOPE ESTIMATION USED FOR AUDIO BANDWIDTH EXTENSION BASED ON RBF NEURAL NETWORK
    Liu, Hao-jie
    Bao, Chang-chun
    Liu, Xin
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 543 - 547
  • [44] Audio-Based Piano Performance Evaluation for Beginners With Convolutional Neural Network and Attention Mechanism
    Wang, Weiqing
    Pan, Jin
    Yi, Hua
    Song, Zhanmei
    Li, Ming
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 (29) : 1119 - 1133
  • [45] Convolutional Neural Network-Based Residue Super-Resolution for Video Coding
    Liu, Kang
    Liu, Dong
    Li, Houqiang
    Wu, Feng
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [46] A Convolutional Neural Network-Based Approach to Rate Control in HEVC Intra Coding
    Li, Ye
    Li, Bin
    Liu, Dong
    Chen, Zhibo
    [J]. 2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,
  • [47] Efficient Intra Bitrate Transcoding for Screen Content Coding Based on Convolutional Neural Network
    Kuang, Wei
    Chan, Yui-Lam
    Tsang, Sik-Ho
    [J]. IEEE ACCESS, 2019, 7 : 107211 - 107224
  • [48] A convolutional neural network-based rate control algorithm for VVC intra coding
    Wang, Jiafeng
    Shang, Xiwu
    Zhao, Xiaoli
    Zhang, Yuhuai
    [J]. DISPLAYS, 2024, 82
  • [49] Low-Complexity Intra Coding Algorithm Based on Convolutional Neural Network for HEVC
    Katayama, Takafumi
    Kuroda, Kazuki
    Shi, Wen
    Song, Tian
    Shimamoto, Takashi
    [J]. CONFERENCE PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), 2018, : 115 - 118
  • [50] Sparsity Through Spiking Convolutional Neural Network for Audio Classification at the Edge
    Leow, Cong Sheng
    Goh, Wang Ling
    Gao, Yuan
    [J]. 2023 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, ISCAS, 2023,