Time-scale modification of audio signals using enhanced WSOLA with management of transients

被引:25
|
作者
Grofit, Shahaf [1 ]
Lavner, Yizhar [2 ,3 ]
机构
[1] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel
[2] Tel Hai Acad Coll, Dept Comp Sci, IL-12210 Upper Galilee, Israel
[3] Technion Israel Inst Technol, Fac Elect Engn, SIPL, IL-32000 Haifa, Israel
关键词
Mel frequency cepstrum; spectral variation; time-scale modification of audio and music signals; waveform similarity overlap-and-add (WSOLA);
D O I
10.1109/TASL.2007.909444
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present an algorithm for time-scale modification of music signals, based on the waveform similarity overlap-and-add technique (WSOLA). A well-known disadvantage of the standard WSOLA is the uniform time-scaling of the entire signal, including the perceptually significant transient sections (PSTs), where temporal envelope changes as well as significant spectral transitions occur. Time-scaling of PSTs can severely degrade the music quality. We address this problem by detecting the PSTs and leaving them intact, while time-scaling the remainder of the signal, which is relatively steady-state. In the proposed algorithm, the PSTs are detected using a Mel frequency cepstrum nonstationarity measure and the normalized cross-correlation, with time-varying threshold functions. Our study shows that the accurate detection of PSTs within the WSOLA framework makes it possible to achieve a higher quality of time-scaled music, as confirmed by subjective listening tests.
引用
收藏
页码:106 / 115
页数:10
相关论文
共 50 条
  • [31] Voice privacy using CycleGAN and time-scale modification
    Prajapati, Gauri P.
    Singh, Dipesh K.
    Amin, Preet P.
    Patil, Hemant A.
    [J]. COMPUTER SPEECH AND LANGUAGE, 2022, 74
  • [32] Voice Privacy Using Time-Scale and Pitch Modification
    Singh D.K.
    Prajapati G.P.
    Patil H.A.
    [J]. SN Computer Science, 5 (2)
  • [33] Histogram-based audio watermarking against time-scale modification and cropping attacks
    Xiang, Shijun
    Huang, Jiwu
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (07) : 1357 - 1372
  • [34] Audio watermarking robust against time-scale modification and MP3 compression
    Xiang, Shijun
    Kim, Hyoung Joong
    Huang, Jiwu
    [J]. SIGNAL PROCESSING, 2008, 88 (10) : 2372 - 2387
  • [35] Time-scale modification of speech signals, for language-learning impaired children
    Erogul, O
    Karagoz, I
    [J]. PROCEEDINGS OF THE 1998 2ND INTERNATIONAL CONFERENCE BIOMEDICAL ENGINEERING DAYS, 1998, : 33 - 35
  • [36] Time-Scale Invariant Audio Data Embedding
    Mohamed F. Mansour
    Ahmed H. Tewfik
    [J]. EURASIP Journal on Advances in Signal Processing, 2003
  • [37] Time-scale invariant audio data embedding
    Mansour, MF
    Tewfik, AH
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (10) : 993 - 1000
  • [38] Analysis of superimposed signals using time-scale phase representation
    Sostaric, A
    Zazula, D
    [J]. PROCEEDINGS OF THE IEEE-SP INTERNATIONAL SYMPOSIUM ON TIME-FREQUENCY AND TIME-SCALE ANALYSIS, 1998, : 505 - 508
  • [39] Speech Time-Scale Modification With GANs
    Cohen, Eyal
    Kreuk, Felix
    Keshet, Joseph
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1067 - 1071
  • [40] Variable time-scale modification of speech using transient information
    Lee, SJ
    Kim, HD
    Kim, HS
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 1319 - 1322