Rhythm detection for speech-music discrimination in MPEG compressed domain

被引：3

作者：

Jarina, R ^{[1
]}

O'Connor, N ^{[1
]}

Marlow, S ^{[1
]}

Murphy, N ^{[1
]}

机构：

[1] Dublin City Univ, Ctr Digital Video Proc, Dublin 9, Ireland

来源：

DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2 | 2002年

关键词：

D O I：

10.1109/ICDSP.2002.1027851

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

A novel approach to speech-music discrimination based on rhythm (or beat) detection is introduced. Rhythmic pulses are detected by applying a long-term autocorrelation method on band-passed signals. This approach is combined with another, in which the features describe the energy peaks of the signal. The discriminator uses just three features that are computed from data directly taken from an MPEG-I bitstream. The discriminator was tested on more than 3 hours of audio data. Average recognition rate is 97.7%.

引用

页码：129 / 132

页数：4

共 50 条

[31] Semi-supervised Single-Channel Speech-Music Separation for Automatic Speech Recognition
Demir, Cemil
Cemgil, A. Taylan
Saraclar, Murat
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 688 - +
[32] Scene change detection using shape information in MPEG-4 compressed domain
Park, IS
Shin, DH
Park, RH
CISST '04: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON IMAGING SCIENCE, SYSTEMS, AND TECHNOLOGY, 2004, : 534 - 540
[33] ANALYSIS OF EFFECT OF SINGLE-CHANNEL SPEECH-MUSIC SEPARATION USING NMF TO AUTOMATIC SPEECH RECOGNITION
Demir, Cemil
Cemgil, A. Taylan
Saraclar, Murat
2014 22ND SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2014, : 1818 - 1821
[34] Detection of AMR double compression using compressed -domain speech features
Sampaio, Jose F. P.
Nascimento, Francisco A. de O.
FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2020, 33
[35] MUSIC TONALITY FEATURES FOR SPEECH/MUSIC DISCRIMINATION
Sell, Gregory
Clark, Pascal
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[36] Music and speech prosody: a common rhythm
Hausen, Maija
Torppa, Ritva
Salmela, Viljami R.
Vainio, Martti
Sarkamo, Teppo
FRONTIERS IN PSYCHOLOGY, 2013, 4
[37] Cortical tracking of rhythm in music and speech
Harding, Eleanor E.
Sammler, Daniela
Henry, Molly J.
Large, Edward W.
Kotz, Sonja A.
NEUROIMAGE, 2019, 185 : 96 - 101
[38] Compressed domain MPEG-2 video editing
Wang, K
Woods, JW
2000 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, PROCEEDINGS VOLS I-III, 2000, : 225 - 228
[39] Semantic video summarization in compressed domain MPEG video
Yu, JCS
Kankanhalli, MS
Mulhem, P
2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 329 - 332
[40] Generation of the MPEG-7 descriptor in compressed domain
Jin, SH
Kim, CS
Kang, HK
Ro, YM
IMAGE PROCESSING: ALGORITHMS AND SYSTEMS, 2002, 4667 : 160 - 169

← 1 2 3 4 5 →