Automatic utterance segmentation tool for speech corpus

被引:0
|
作者
Ozawa, Mitsuhiro [1 ]
Tsuge, Satoru [2 ]
Shishibori, Masami [2 ]
Kita, Kenji [3 ]
Fukumi, Minoru [2 ]
Ren, Fuji [4 ]
Kuroiwa, Shingo [2 ]
机构
[1] Univ Tokushima, Grad Sch Adv Technol & Sci, Tokushima, Japan
[2] Univ Tokushima, Inst Technol & Sci, Tokushima, Japan
[3] Univ Tokushima, Ctr Adv Informat Technol, Tokushima, Japan
[4] Univ Tokushima, Beijing Univ Posts & Telecommun, Inst Technol & Sci, Tokushima, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, we collect the speech data for investigating an intra-speakers' speech variability over a short and long time. In general, to reduce the load of speakers, the speech data are collected as one file from collecting start to collecting end. Hence, there are some noises, non-speech sections and mistaken sections in this file. Consequently, we must segment this file into individual utterances and select the useful utterances. This process requires a lot of time and efforts. In this paper, we propose an automatic utterance segmentation tool for dividing the collected speech data. The proposed tool is composed of four processes, which are a voice activity detection, speech recognition, a DP matching, and a correct of speech section. For evaluating the proposed tool, we conduct the evaluation experiments using a female speaker's speech data in our corpus. Experimental results show that the proposed method can reduce a filing time by 90% compared to a manual filing. In This paper, first, we introduced the large speech corpus. This speech corpus contains is the speech data collected by specific speaker over long and short time periods. And, we explained the automatic utterance segmentation tool which we made in the case of corpus build. And inspected the validity. As a result, it was demonstrated that the automatic utterance segmentation tool was high-performance. Furthermore, it was demonstrated that speech corpus build became simple by using the automatic utterance segmentation tool.
引用
收藏
页码:401 / +
页数:2
相关论文
共 50 条
  • [21] Bangladeshi Bangla speech corpus for automatic speech recognition research
    Kibria, Shafkat
    Samin, Ahnaf Mozib
    Kobir, M. Humayon
    Rahman, M. Shahidur
    Selim, M. Reza
    Iqbal, M. Zafar
    Speech Communication, 2022, 136 : 84 - 97
  • [22] KsponSpeech: Korean Spontaneous Speech Corpus for Automatic Speech Recognition
    Bang, Jeong-Uk
    Yun, Seung
    Kim, Seung-Hi
    Choi, Mu-Yeol
    Lee, Min-Kyu
    Kim, Yeo-Jeong
    Kim, Dong-Hyun
    Park, Jun
    Lee, Young-Jik
    Kim, Sang-Hun
    APPLIED SCIENCES-BASEL, 2020, 10 (19): : 1 - 17
  • [23] An automatic speech recognition system for spontaneous Punjabi speech corpus
    Kumar Y.
    Singh N.
    International Journal of Speech Technology, 2017, 20 (2) : 297 - 303
  • [24] Automatic Construction of the Finnish Parliament Speech Corpus
    Mansikkaniemi, Andre
    Smit, Peter
    Kurimo, Mikko
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3762 - 3766
  • [25] DNN ADAPTATION FOR RECOGNITION OF CHILDREN SPEECH THROUGH AUTOMATIC UTTERANCE SELECTION
    Matassoni, Marco
    Falavigna, Daniele
    Giuliani, Diego
    2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 644 - 651
  • [26] Analysis of HMM Temporal Evolution for Automatic Speech Recognition and Utterance Verification
    Casar, Marta
    Fonollosa, Jose A. R.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 613 - 616
  • [27] CEASR: A Corpus for Evaluating Automatic Speech Recognition
    Ulasik, Malgorzata Anna
    Huerlimann, Manuela
    Germann, Fabian
    Gedik, Esin
    Benites, Fernando
    Cieliebak, Mark
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6477 - 6485
  • [28] Multimodal English corpus for automatic speech recognition
    Kunka, Bartosz
    Kupryjanow, Adam
    Dalka, Piotr
    Bratoszewski, Piotr
    Szczodrak, Maciej
    Spaleniak, Pawel
    Szykulski, Marcin
    Czyzewski, Andrzej
    2013 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2013, : 106 - 111
  • [29] Frame Distance Array Algorithm Parameter Tune-up for TIMIT Corpus Automatic Speech Segmentation
    Seddiq, Yasser M.
    Alotaibi, Yousef A.
    Selouani, Sid-Ahmed
    2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRO/INFORMATION TECHNOLOGY (EIT), 2015, : 241 - 245