Time-scale modification of audio signals using enhanced WSOLA with management of transients

被引：25

作者：

Grofit, Shahaf ^{[1
]}

Lavner, Yizhar ^{[2
,3
]}

机构：

[1] Tel Aviv Univ, Sch Comp Sci, IL-69978 Tel Aviv, Israel

[2] Tel Hai Acad Coll, Dept Comp Sci, IL-12210 Upper Galilee, Israel

[3] Technion Israel Inst Technol, Fac Elect Engn, SIPL, IL-32000 Haifa, Israel

来源：

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2008年 / 16卷 / 01期

关键词：

Mel frequency cepstrum; spectral variation; time-scale modification of audio and music signals; waveform similarity overlap-and-add (WSOLA);

D O I：

10.1109/TASL.2007.909444

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we present an algorithm for time-scale modification of music signals, based on the waveform similarity overlap-and-add technique (WSOLA). A well-known disadvantage of the standard WSOLA is the uniform time-scaling of the entire signal, including the perceptually significant transient sections (PSTs), where temporal envelope changes as well as significant spectral transitions occur. Time-scaling of PSTs can severely degrade the music quality. We address this problem by detecting the PSTs and leaving them intact, while time-scaling the remainder of the signal, which is relatively steady-state. In the proposed algorithm, the PSTs are detected using a Mel frequency cepstrum nonstationarity measure and the normalized cross-correlation, with time-varying threshold functions. Our study shows that the accurate detection of PSTs within the WSOLA framework makes it possible to achieve a higher quality of time-scaled music, as confirmed by subjective listening tests.

引用

页码：106 / 115

页数：10

共 50 条

[31] Voice privacy using CycleGAN and time-scale modification
Prajapati, Gauri P.
Singh, Dipesh K.
Amin, Preet P.
Patil, Hemant A.
[J]. COMPUTER SPEECH AND LANGUAGE, 2022, 74
[32] Voice Privacy Using Time-Scale and Pitch Modification
Singh D.K.
Prajapati G.P.
Patil H.A.
[J]. SN Computer Science, 5 (2)
[33] Histogram-based audio watermarking against time-scale modification and cropping attacks
Xiang, Shijun
Huang, Jiwu
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (07) : 1357 - 1372
[34] Audio watermarking robust against time-scale modification and MP3 compression
Xiang, Shijun
Kim, Hyoung Joong
Huang, Jiwu
[J]. SIGNAL PROCESSING, 2008, 88 (10) : 2372 - 2387
[35] Time-scale modification of speech signals, for language-learning impaired children
Erogul, O
Karagoz, I
[J]. PROCEEDINGS OF THE 1998 2ND INTERNATIONAL CONFERENCE BIOMEDICAL ENGINEERING DAYS, 1998, : 33 - 35
[36] Time-Scale Invariant Audio Data Embedding
Mohamed F. Mansour
Ahmed H. Tewfik
[J]. EURASIP Journal on Advances in Signal Processing, 2003
[37] Time-scale invariant audio data embedding
Mansour, MF
Tewfik, AH
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2003, 2003 (10) : 993 - 1000
[38] Analysis of superimposed signals using time-scale phase representation
Sostaric, A
Zazula, D
[J]. PROCEEDINGS OF THE IEEE-SP INTERNATIONAL SYMPOSIUM ON TIME-FREQUENCY AND TIME-SCALE ANALYSIS, 1998, : 505 - 508
[39] Speech Time-Scale Modification With GANs
Cohen, Eyal
Kreuk, Felix
Keshet, Joseph
[J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1067 - 1071
[40] Variable time-scale modification of speech using transient information
Lee, SJ
Kim, HD
Kim, HS
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS, 1997, : 1319 - 1322

← 1 2 3 4 5 →