A perceptual subspace approach for modeling of speech and audio signals with damped sinusoids

被引:27
|
作者
Jensen, J [1 ]
Heusdens, R
Jensen, SH
机构
[1] Delft Univ Technol, Dept Mediamat, NL-2628 CD Delft, Netherlands
[2] Aalborg Univ, Dept Commun Technol, DK-9220 Aalborg, Denmark
来源
关键词
complex exponentials; perceptually relevant sinusoids; psycho-acoustical distortion measure; sinusoidal modeling; speech and audio processing; subspace-based signal analysis;
D O I
10.1109/TSA.2003.819948
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The problem of modeling a signal segment as a sum of exponentially damped sinusoidal components arises in many different application areas, including speech and audio processing. Often, model parameters are estimated using subspace based techniques which arrange the input signal in a structured matrix and exploit the so-called shift-invariance property related to certain vector spaces of the input matrix. A problem with this class of estimation algorithms, when used for speech and audio processing, is that the perceptual importance of the sinusoidal components is not taken into account. In this work we propose a solution to this problem. In particular, we show how to combine well-known subspace based estimation techniques with a recently developed perceptual distortion measure, in order to obtain,an algorithm for extracting perceptually relevant model components. In analysis-synthesis experiments with wideband audio signals, objective and subjective evaluations show that the proposed Algorithm improves perceived signal quality considerable over traditional subspace based analysis methods.
引用
收藏
页码:121 / 132
页数:12
相关论文
共 50 条
  • [21] Speech signals separation: A new approach exploiting the coherence of audio and visual speech
    Girin, L
    Allard, A
    Schwartz, JL
    2001 IEEE FOURTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2001, : 631 - 636
  • [22] Neural-Based Approach to Perceptual Sparse Coding of Audio Signals
    Pichevar, Ramin
    Najaf-Zadeh, Hossein
    Mustiere, Frederic
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [23] Equalization of speech and audio signals using a nonlinear dynamical approach
    Chan, AM
    Leung, H
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03): : 356 - 360
  • [24] AUDIO CLASSIFICATION OF MUSIC/SPEECH MIXED SIGNALS USING SINUSOIDAL MODELING WITH SVM AND NEURAL NETWORK APPROACH
    Mowlaee, Pejman
    Sayadiyan, Abolghasem
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2013, 22 (02)
  • [25] PERCEPTUAL SUBSPACE SPEECH ENHANCEMENT WITH SSDR NORMALIZATION
    Surendran, Sudeep
    Kumar, T. Kishore
    2016 INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING AND COMMUNICATIONS (MICROCOM), 2016,
  • [26] Perceptual Subspace Speech Enhancement with Variance Normalization
    Surendran, Sudeep
    Kumar, T. Kishore
    ELEVENTH INTERNATIONAL CONFERENCE ON COMMUNICATION NETWORKS, ICCN 2015/INDIA ELEVENTH INTERNATIONAL CONFERENCE ON DATA MINING AND WAREHOUSING, ICDMW 2015/NDIA ELEVENTH INTERNATIONAL CONFERENCE ON IMAGE AND SIGNAL PROCESSING, ICISP 2015, 2015, 54 : 818 - 828
  • [27] Variance normalized perceptual subspace speech enhancement
    Surendran, Sudeep
    Kumar, T. Kishore
    AEU-INTERNATIONAL JOURNAL OF ELECTRONICS AND COMMUNICATIONS, 2017, 74 : 44 - 54
  • [28] Log amplitude modeling of sinusoids in voiced speech
    Malik, N
    Holmes, WH
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 465 - 468
  • [29] Perceptual Models for Speech, Audio, and Music Processing
    Allen, Jont B.
    Chan, Wai-Yip Geoffrey
    Voran, Stephen
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2007, 2007 (1)
  • [30] Wideband speech and audio coding in the perceptual domain
    Lin, L
    Ambikairajah, E
    Holmes, WH
    ADVANCED SIGNAL PROCESSING FOR COMMUNICATION SYSTEMS, 2002, 703 : 15 - 30