Constructing multi-level speech database for spontaneous speech processing

被引:0
|
作者
Hahn, M
Kim, S
Lee, JC
Lee, YJ
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes a new database, called muti-level speech database, for spontaneous speech processing. We designed the database to cover textual and acoustic variations from declarative speech to spontaneous speech. The database is composed of 5 categories which are, in the order of decreasing spontaneity, spontaneous speech, interview, simulated interview, declarative speech with context, and declarative speech without context. We collected total 112 sets from 23 subjects(male: 19, female: 4). Then the database was firstly transcribed using 15 transcription symbols according to our own transcription rules. Secondly, prosodic information will be added. The goal of this research is a comparative textual and prosodic analysis at each level, quantification of spontaneity of diversified speech database for dialogue speech synthesis and recognition. From the preliminary analysis of transcribed texts, the spontaneous speech has more corrections, repetitions, and pauses than the others as expected. In addition, average number of sentences per turn of spontaneous speech is greater than the others. From the above results, we can quantify the spontaneity of speech database.
引用
收藏
页码:1930 / 1933
页数:4
相关论文
共 50 条
  • [41] Low bit-rate speech coding with predictive multi-level vector quantization
    Yu, Xingye
    Li, Ye
    Zhang, Peng
    Lin, Lingxia
    Cai, Tianyu
    [J]. Applied Acoustics, 2025, 231
  • [42] SPEECH EMOTION RECOGNITION WITH CO-ATTENTION BASED MULTI-LEVEL ACOUSTIC INFORMATION
    Zou, Heqing
    Si, Yuke
    Chen, Chen
    Rajan, Deepu
    Chng, Eng Siong
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7367 - 7371
  • [43] Prediction of L2 speech proficiency based on multi-level linguistic features
    De Fino, Verdiana
    Fontan, Lionel
    Pinquier, Julien
    Ferrane, Isabelle
    Detey, Sylvain
    [J]. INTERSPEECH 2022, 2022, : 4043 - 4047
  • [44] Minimal Cross-correlation Criterion for Speech Emotion Multi-level Feature Selection
    Liogiene, Tatjana
    Tamulevicius, Gintautas
    [J]. 2015 OPEN CONFERENCE OF ELECTRICAL, ELECTRONIC AND INFORMATION SCIENCES (ESTREAM), 2015,
  • [45] A Multi-Level Circulant Cross-Modal Transformer for Multimodal Speech Emotion Recognition
    Gong, Peizhu
    Liu, Jin
    Wu, Zhongdai
    Han, Bing
    Wang, Y. Ken
    He, Huihua
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 4203 - 4220
  • [46] Prediction of L2 speech proficiency based on multi-level linguistic features
    IRIT, Université de Toulouse, CNRS, Toulouse INP, UT3, Toulouse, France
    不详
    不详
    [J]. Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, (4043-4047): : 4043 - 4047
  • [47] GPR-based Thai speech synthesis using multi-level duration prediction
    Moungsri, Decha
    Koriyama, Tomoki
    Kobayashi, Takao
    [J]. SPEECH COMMUNICATION, 2018, 99 : 114 - 123
  • [48] Multi-level evidence of an allelic hierarchy of USH2A variants; hearing loss, auditory processing and speech/language outcomes
    Perrino, P. A.
    Nedevska, L.
    Reader, R.
    Hill, A.
    Rendall, A. R.
    Mountford, H. S.
    Buscarello, A. N.
    Lahiri, N.
    Saggar, A.
    Fitch, R. H.
    Newbury, D. F.
    [J]. EUROPEAN JOURNAL OF HUMAN GENETICS, 2019, 27 : 1218 - 1219
  • [49] Speech-to-text and speech-to-speech summarization of spontaneous speech
    Furui, S
    Kikuchi, T
    Shinnaka, Y
    Hori, C
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2004, 12 (04): : 401 - 408
  • [50] Fluent Personalized Speech Synthesis with Prosodic Word-Level Spontaneous Speech generation
    Huang, Yi-Chin
    Wu, Chung-Hsien
    Shie, Ming-Ge
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 294 - 298