Detecting breathing sounds in realistic Japanese telephone conversations and its application to automatic speech recognition

被引:9
|
作者
Fukuda, Takashi [1 ]
Ichikawa, Osamu [1 ]
Nishimura, Masafumi [2 ]
机构
[1] IBM Res AI, Chuo Ku, Nihonbashi Hakozaki Cho, Tokyo 1038510, Japan
[2] Shizuoka Univ, Suruga Ku, Shizuoka 4228017, Japan
关键词
Breath-event detection; Spontaneous speech; Speech phrasing; Voice activity detection; Automatic speech recognition; FEATURES;
D O I
10.1016/j.specom.2018.01.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Non-verbal sound detection has long attracted attention in the speech analytics field. Although detecting laughter, coughs, and lip smacking has been well studied in the literature, breath-event detection has not been investigated much despite the need for doing so. Breath events are highly correlated with major prosodic breaks, meaning that the positions of breath events can be used as a delimiter of utterances in combination with a voice activity detection (VAD) technique. Silence intervals approximately 20 ms long right before and after breathing sounds, called "edges", are clearly observed in speech signals. In the literature, capturing the edges is shown to be very effective in reducing false alarms in the detection of breath events. However, the edges often disappear when breaths are taken in spontaneous speech. In this work, we focus on the robustness of breath-event detection in spontaneous speech. The breath detection method we have developed leverages acoustic information that is specialized for breathing sounds, leading to a two-step approach that can detect breath events with an accuracy of 97.4%. We also propose splitting unsegmented speech signals into semantically grouped utterances by leveraging the breath events. The speech segmentation based on accurate breath-event detection provided a 3.8% relative error reduction in automatic speech recognition (ASR).
引用
收藏
页码:95 / 103
页数:9
相关论文
共 50 条
  • [1] Fractal dimensions of speech sounds: Computation and application to automatic speech recognition
    Maragos, P
    Potamianos, A
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1999, 105 (03): : 1925 - 1932
  • [2] AUTOMATIC SPEECH RECOGNITION AND ITS APPLICATION
    BRUNDAGE, WJ
    [J]. CONTROL ENGINEERING, 1983, 30 (04) : 117 - 117
  • [3] INFRASONIC CUES FOR AUTOMATIC RECOGNITION OF SPEECH SOUNDS
    MYASNIKO.
    MYASNIKO.EN
    PEKELNYI, MY
    TRILESNIK, A
    [J]. SOVIET PHYSICS ACOUSTICS-USSR, 1969, 14 (04): : 522 - +
  • [4] An automatic telephone operator using speech recognition
    Zhou, GJ
    Zeng, LG
    Feng, CX
    [J]. 1996 INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY, VOLUMES 1 AND 2 - PROCEEDINGS, 1996, : 420 - 423
  • [5] SPEECH RECOGNITION ROBUST AGAINST SPEECH OVERLAPPING IN MONAURAL RECORDINGS OF TELEPHONE CONVERSATIONS
    Suzuki, Masayuki
    Kurata, Gakuto
    Nagano, Tohru
    Tachibana, Ryuki
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5685 - 5689
  • [6] Mandarin telephone: Speech recognition for automatic telephone number directory service
    Wang, YR
    Chen, SH
    [J]. PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 841 - 844
  • [7] On the Use of Linguistic Features in an Automatic System for Speech Analytics of Telephone Conversations
    Maza, Benjamin
    El-Beze, Marc
    Linares, Georges
    De Mori, Renato
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2060 - 2063
  • [8] Automatic speech recognition services in common telephone network
    Karpov, A
    Ronzhin, A
    [J]. Proceedings of the Second IASTED International Multi-Conference on Automation, Control, and Information Technology - Signal and Image Processing, 2005, : 220 - 225
  • [9] Generating and evaluating segmentations for automatic speech recognition of conversational telephone speech
    Tranter, SE
    Yu, K
    Evermann, G
    Woodland, RC
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 753 - 756
  • [10] AN APPLICATION OF AUTOMATIC SPEECH RECOGNITION
    HENTHORN, KS
    MACCORMACK, PJ
    [J]. JOURNAL OF MICROCOMPUTER APPLICATIONS, 1982, 5 (03): : 239 - 245