Design of an efficient music-speech discriminator

被引:4
|
作者
Tardon, Lorenzo J. [1 ]
Sammartino, Simone [1 ]
Barbancho, Isabel [1 ]
机构
[1] Univ Malaga, ETS Ingn Telecomunicac, Dept Ingn Comunicac, E-29071 Malaga, Spain
来源
关键词
CLASSIFICATION; RECOGNITION;
D O I
10.1121/1.3257204
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, the problem of the design of a simple and efficient music-speech discriminator for large audio data sets in which advanced music playing techniques are taught and voice and music are intrinsically interleaved is addressed. In the process, a number of features used in speech-music discrimination are defined and evaluated over the available data set. Specifically, the data set contains pieces of classical music played with different and unspecified instruments (or even lyrics) and the voice of a teacher (a top music performer) or even the overlapped voice of the translator and other persons. After an initial test of the performance of the features implemented, a selection process is started, which takes into account the type of classifier selected beforehand, to achieve good discrimination performance and computational efficiency, as shown in the experiments. The discrimination application has been defined and tested on a large data set supplied by Fundacion Albeniz, containing a large variety of classical music pieces played with different instrument, which include comments and speeches of famous performers. (C) 2010 Acoustical Society of America. [DOI: 10.1121/1.3257204]
引用
收藏
页码:271 / 279
页数:9
相关论文
共 50 条
  • [1] MUSIC MODELS FOR MUSIC-SPEECH SEPARATION
    Hughes, Thad
    Kristjansson, Trausti
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4917 - 4920
  • [2] MUSIC MODELS FOR MUSIC-SPEECH SEPARATION
    Hughes, Thad
    Kristjansson, Trausti
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4917 - 4920
  • [3] A robust and computationally efficient Speech/Music discriminator
    Jayme, Garcia Arnal Barbedo
    Lopes, Amauri
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2006, 54 (7-8): : 571 - 588
  • [4] Robust singing detection in speech/music discriminator design
    Chou, W
    Gu, L
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING, 2001, : 865 - 868
  • [5] LEVERAGING STRUCTURAL INFORMATION IN MUSIC-SPEECH DECTECTION
    Han, Jinyu
    Coover, Bob
    [J]. ELECTRONIC PROCEEDINGS OF THE 2013 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2013,
  • [6] Speech Segregation based on Pitch Track Correction and Music-Speech Classification
    Kim, Han-Gyu
    Jang, Gil-Jin
    Park, Jeong-Sik
    Kim, Ji-Hwan
    Oh, Yung-Hwan
    [J]. ADVANCES IN ELECTRICAL AND COMPUTER ENGINEERING, 2012, 12 (02) : 15 - 20
  • [7] Music Component Characterization in the Music-Speech Mixture for Female Singing Tracks
    Sharma, Shivam
    Mittal, Vinay Kumar
    [J]. 2017 2ND INTERNATIONAL CONFERENCE ON TELECOMMUNICATION AND NETWORKS (TEL-NET), 2017, : 126 - 132
  • [8] Random fourier feature based music-speech classification
    Vyshnav, M. T.
    Kumar, S. Sachin
    Mohan, Neethu
    Soman, K. P.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (05) : 6353 - 6363
  • [9] Mixed wideband speech and music coding using a speech/music discriminator
    Qiao, RY
    [J]. IEEE TENCON'97 - IEEE REGIONAL 10 ANNUAL CONFERENCE, PROCEEDINGS, VOLS 1 AND 2: SPEECH AND IMAGE TECHNOLOGIES FOR COMPUTING AND TELECOMMUNICATIONS, 1997, : 605 - 608
  • [10] Construction and evaluation of a robust multifeature speech/music discriminator
    Scheirer, E
    Slaney, M
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1331 - 1334