MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE

被引:0
|
作者
Godambe, Tejas [1 ]
Bondale, Nandini [1 ]
Samudravijaya, K. [1 ]
Rao, Preeti [2 ]
机构
[1] Tata Inst Fundamental Res, Sch Technol & Comp Sci, Homi Bhabha Rd, Bombay 400005, Maharashtra, India
[2] Indian Inst Technol, Dept Elect Engn, Bombay, Maharashtra, India
关键词
speech recognition; speech data; Marathi; transcription;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We describe the development of a continuous speech database in Marathi language. Speech data was collected from about 1500 literate speakers from 34 districts of Maharashtra, with a variety of characteristics such as age group, gender, mother tongue and educational qualification. The subjects called the data acquisition system with personal mobile handsets, and read specially designed sentence sets. The sentence data acquisition process was conducted on field in contrast to a quiet environment. As a result, the acquired speech data captured large amount of nonspeech sounds as well as incompletely spoken words. So, the speech data was transcribed employing additional labels to denote frequently occurring nonspeech sounds, different kinds of incomplete words and invalid words. We characterize the database in terms of the statistics of features such as gender distribution of speakers, phonemic richness, amount of non speech sounds, and average sentence and word lengths for both reference and actual sentences.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Unsupervised Discovery of Phoneme Boundaries in Multi-Speaker Continuous Speech
    Armstrong, Tom
    Antetomaso, Stephanie
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING (ICDL), 2011,
  • [2] A Multi-channel/Multi-speaker Articulatory Database in Mandarin for Speech Visualization
    Zhang, Dan
    Liu, Xianqian
    Yan, Nan
    Wang, Lan
    Zhu, Yun
    Chen, Hui
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 299 - +
  • [3] MultiSpeech: Multi-Speaker Text to Speech with Transformer
    Chen, Mingjian
    Tan, Xu
    Ren, Yi
    Xu, Jin
    Sun, Hao
    Zhao, Sheng
    Qin, Tao
    [J]. INTERSPEECH 2020, 2020, : 4024 - 4028
  • [4] Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech
    Das, Rohan Kumar
    Yang, Jichen
    Li, Haizhou
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1630 - 1635
  • [5] Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems
    He, Fei
    Chu, Shan-Hui Cathy
    Kjartansson, Oddur
    Rivera, Clara
    Katanova, Anna
    Gutkin, Alexander
    Demirsahin, Isin
    Johny, Cibu
    Jansche, Martin
    Sarin, Supheakmungkol
    Pipatsrisawat, Knot
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6494 - 6503
  • [6] Open-source multi-speaker speech corpora for building gujarati, kannada, malayalam, marathi, tamil and telugu speech synthesis systems
    He, Fei
    Chu, Shan-Hui Cathy
    Kjartansson, Oddur
    Rivera, Clara
    Katanova, Anna
    Gutkin, Alexander
    Demirsahin, Isin
    Johny, Cibu
    Jansche, Martin
    Sarin, Supheakmungkol
    Pipatsrisawat, Knot
    [J]. LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings, 2020, : 6494 - 6503
  • [7] Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Toda, Tomoki
    [J]. IEEE Signal Processing Letters, 2024, 31 : 2995 - 2999
  • [8] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [9] TOWARDS MULTI-SPEAKER UNSUPERVISED SPEECH PATTERN DISCOVERY
    Zhang, Yaodong
    Glass, James R.
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4366 - 4369
  • [10] Advances in multi-speaker conversational speech recognition and understanding
    Hori, Takaaki
    Araki, Shoko
    Nakatani, Tomohiro O.
    Nakamura, Atsushi
    [J]. NTT Technical Review, 2013, 11 (12):