Mixed wideband speech and music coding using a speech/music discriminator

被引：0

作者：

Qiao, RY ^{[1
]}

机构：

[1] CSIRO, Epping, NSW 2121, Australia

来源：

IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS | 1997年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In multimedia applications such as videoconferencing, users are demanding higher quality speech/audio transmission than the POTS can offer. 7 kHz wideband speech/audio offers a good compromise between bandwidth and sound quality. It improves the intelligibility and naturalness of speech and adds a feeling of transparent communication. Currently the only existing international standard for coding such signals is the G.722 wideband speech/audio coder. While its coding quality is satisfactory, it leaves much to be desired with its bit rate. CELP-based approach has been very successful in telephone bandwidth speech coding, but is not suitable for coding non-speech signals because of the assumed signal production model. This paper proposes an alternative approach to mixed speech/music coding, which uses a discriminator to separate music signals from speech, and codes them with the G.722 coder and a G.723.1-based speech coder, respectively. Simulations shows very promising results.

引用

页码：605 / 608

页数：4

共 50 条

[31] MUSIC MODELS FOR MUSIC-SPEECH SEPARATION
Hughes, Thad
Kristjansson, Trausti
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4917 - 4920
[32] MUSIC MODELS FOR MUSIC-SPEECH SEPARATION
Hughes, Thad
Kristjansson, Trausti
[J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4917 - 4920
[33] Wideband coding of speech using a scalable pulse codebook
Ashley, JP
Cruz-Zeno, EM
Mittal, U
Peng, WM
[J]. 2000 IEEE WORKSHOP ON SPEECH CODING, PROCEEDINGS: MEETING THE CHALLENGES OF THE NEW MILLENNIUM, 2000, : 148 - 150
[34] Music expertise shapes audiovisual temporal integration windows for speech, sinewave speech, and music
Lee, Hweeling
Noppeney, Uta
[J]. FRONTIERS IN PSYCHOLOGY, 2014, 5
[35] Mixed excitation linear prediction coding of wideband speech at 8 kbps
Lin, WR
Koh, SN
Lin, X
[J]. 2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1137 - 1140
[36] End-to-end Music-mixed Speech Recognition
Woo, Jeongwoo
Mimura, Masato
Yoshii, Kazuyoshi
Kawahara, Tatsuya
[J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 800 - 804
[37] Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music
Khonglah, Banriskhem K.
Dey, Abhishek
Prasanna, S. R. Mahadeva
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2019, 38 (02) : 643 - 663
[38] Speech Enhancement Using Source Information for Phoneme Recognition of Speech with Background Music
Banriskhem K. Khonglah
Abhishek Dey
S. R. Mahadeva Prasanna
[J]. Circuits, Systems, and Signal Processing, 2019, 38 : 643 - 663
[39] Trends and perspectives in wideband speech coding
Schnitzler, J
Vary, P
[J]. SIGNAL PROCESSING, 2000, 80 (11) : 2267 - 2281
[40] Music, language, speech, and brain
不详
[J]. INTERNATIONAL JOURNAL OF PSYCHOPHYSIOLOGY, 2014, 94 (02) : 125 - 126

← 1 2 3 4 5 →